Well, it is if you are a financial analyst and you believe this new study.
Just a few months ago, an MIT paper by Andrew W. Lo and Jillian Ross discussed Generative AI and Financial Advice in a case study. It certainly made our industry sit up and take notice – “An LLM (Large Language Model) can role-play a financial advisor convincingly and often accurately for a client,” they wrote.
Even more concerningly the authors contended that AI can be trained to synthesise a personality that clients will find engaging, with the caveat “even the largest language model currently appears to lack the sense of responsibility and ethics required by law from a human financial advisor.”
And now, it’s time for analysts to be alarmed. A brand new study from the University of Chicago Booth School of Business, researchers Alex G. Kim, Maximilian Muhn, and Valeri V. Nikolaev reveals that large language models (LLMs), specifically GPT-4, can outperform human financial analysts in predicting earnings changes from financial statements.
The report authors used Chat GPT-4 turbo for their research and tried to anonymize data as much as possible –their results found that the AI’s results accuracy was “remarkably higher than that achieved by the analysts.”
“Finally, we explore the economic usefulness of GPT’s forecasts by analyzing their value in predicting stock price movements,” they said in their study. “We find that the long-short strategy based on GPT forecasts outperforms the market and generates significant alphas and Sharpe ratios. For example, alpha in the Fama-French three-factor model exceeds 12% per year.”
KEY FINDINGS
- Superior Accuracy:
- The study shows that GPT-4, even without narrative or industry-specific information, surpasses human analysts in predicting the direction of future earnings. When utilizing a chain-of-thought (CoT) prompt, which mimics the step-by-step reasoning of human analysts, GPT-4 achieves an impressive 60.35% accuracy, compared to the 52.71% accuracy of analysts.
- Advantage in Complexity:
- The LLM demonstrates a notable advantage in situations where human analysts struggle, such as with firms exhibiting high earnings volatility or those recording losses. This suggests that GPT-4 can effectively handle complex financial scenarios, providing reliable insights where human predictions falter.
- Comparison with Specialized Models:
- The LLM's performance is on par with state-of-the-art machine learning models, such as artificial neural networks (ANNs), which are specifically trained for earnings prediction. GPT-4’s accuracy is comparable to these specialized models, further validating its potential in financial analysis.
- Complementary to Human Analysis:
- Despite outperforming in certain areas, the study also finds that human analysts and GPT-4 forecasts can complement each other. When used together, they provide incremental insights, enhancing the overall predictive accuracy.
- Economic Implications:
- Trading strategies based on GPT-4’s predictions yield higher Sharpe ratios and alphas compared to traditional models, indicating superior risk-adjusted returns. This positions LLMs as valuable tools in investment decision-making.
But what does that mean for us?
The findings of this study suggest that we could see significant shift in the role of financial advisors in our lifetimes, making it important that we learn how to use the software to improve our offerings, rather than be replaced by it. Although studies show that our clients prefer advice from real people, that may not remain so forever.
And in fact, certain groups already trust AI more than financial professionals (yes, that’s you white males). That having been said, many may also want verification from a financial adviser after consulting AI. “What we learned, though, was most people who are consulting these resources are verifying what they hear with a financial advisor,” said Kevin Keller, CEO of the CFP Board to CNBC.
- Enhanced Decision-Making: Integrating LLMs like GPT-4 into the financial advisory process can improve the accuracy of earnings forecasts and investment strategies, providing a competitive edge.
- Focus on Value-Added Services: With LLMs handling complex numerical analyses, financial advisors can focus more on strategic advisory services, client relationship management, and personalized financial planning.
How did they test financial analysis accuracy?
The researchers employed a structured methodology to evaluate the effectiveness of Large Language Models (LLMs), specifically GPT-4, in performing financial statement analysis and predicting future earnings compared to human analysts and traditional machine learning models.
Here's a summary of their approach:
- Research Design:
- The researchers provided standardized and anonymized financial statements (balance sheets and income statements) to GPT-4 without any narrative or industry-specific information.
- The model was tasked with predicting whether a company's earnings would increase or decrease in the following period.
- Prompt Engineering:
- Two types of prompts were used: a simple prompt instructing the LLM to analyze the financial statements and a more detailed Chain-of-Thought (CoT) prompt mimicking the steps a human analyst would take (identifying trends, computing financial ratios, and synthesizing information to predict future earnings).
- Data Handling:
- The financial statements were anonymized and standardized, omitting company names and replacing years with labels (e.g., t, t-1).
- The sample included 150,678 firm-year observations from 15,401 distinct firms (1968-2021) and 39,533 observations from 3,152 firms with analyst following (1983-2021).
- Prediction Models and Benchmarks:
- The performance of GPT-4 was compared against human analysts, logistic regression models, and Artificial Neural Networks (ANNs).
- Analyst forecasts were taken as the median of individual forecasts issued in the month following the release of financial statements.
- Performance Metrics:
- Accuracy and F1-score were used to evaluate prediction quality.
- Results showed that a simple GPT-4 prompt achieved 52% accuracy, while the CoT prompt achieved 60% accuracy, outperforming both human analysts and specialized machine learning models.
- Robustness Checks:
- The researchers tested the model's predictions on the entire Compustat universe and used different subsets to ensure robustness.
- An out-of-sample test using data from 2022-2023 was conducted to rule out the potential look-ahead bias.
- Incremental Informativeness:
- The study examined whether the model's predictions provided additional insights beyond those of human analysts and ANNs, finding that GPT-4's forecasts were complementary and added value, especially when human analysts were prone to biases or inefficiencies.
- Sources of Predictive Ability:
- The researchers investigated if the model's performance was due to its memory or genuine analytical capability. They concluded that the model generated useful narrative insights from financial ratios and trends, contributing to its predictive success.
In conclusion, the methodology rigorously tested GPT-4's capability in financial statement analysis through anonymization, standardized data handling, prompt engineering, robust performance evaluation, and comparison with both human and machine benchmarks. The results showed that at least in this study, GPT-4 has the potential to outperform traditional financial analysts and specialized machine learning models in predicting future earnings.
It is still early days, and the tech is rapidly evolving, but as LLMs demonstrate their prowess in financial statement analysis, the financial advisory industry could stand on the brink of transformation.