1 Introduction
Equity research analysts at financial institutions play a pivotal role in capital markets. Both large institutions and small investors alike lack the time and resources to analyze thousands of companies and meet with their management. Such activity would similarly overwhelm even the largest public companies. Analysts provide a necessary conduit between the companies’ management and investors. In addition, regulators have a strong interest in the efficient flow of information from companies to ensure functional, liquid markets; analysts significantly contribute to this process (Bradshaw, 2011).
One important function of analysts is forecasting companies’ future earnings and other financial quantities, e.g. revenue and cash flow. Investors use these quantities to understand the financial health of companies and the value of their stock. There are hundreds of academic studies which have analyzed these forecasts and their accuracy. Analysts’ estimates of upcoming earning have been shown to be more accurate than those produced by a range of timeseries models of actual reported earnings (Brown and Rozeff, 1978; Fried and Givoly, 1982). Brown et al. (1987) finds that this is due to both a timing and information advantage. Accordingly, there is a strong relationship between analysts’ revisions to their estimates and changes in stock prices (Michaely and Womack, 2005).
Despite the value in these estimates, the academic finance and behavioral economics communities have found significant empirical evidence of various types of analyst biases and systematic errors. Much of this literature focuses on overoptimism in analysts’ earnings forecasts (De Bondt and Thaler, 1990; Hong and Kubik, 2003; Elliott et al., 2010). Common explanations for this optimism include incentives for analysts to maintain good relations with companies’ management (Francis and Philbrick, 1993; Richardson et al., 1999) and possible conflicts of interest with companies who are also investment banking clients (Michaely and Womack, 1999; O’ Brien et al., 2005). Other literature, however, has highlighted that analysts are actually pessimistic or conservative in their estimates at shorter horizons and overly optimistic at longerterm horizons (Richardson et al., 1999; Brown, 1997; Eames and Glover, 2003; Elliott et al., 2010). This literature explains this asymmetry similarly in terms analysts incentives to maintain good relations with companies’ management: pessimistic projections for an upcoming quarter will result in a company beating expectations and a subsequent positive stock price reaction, whereas unrealistically optimistic projections will ultimately result in a company missing expectations and a negative stock price reaction.
Many papers have investigated more specific behavioral biases (De Bondt and Thaler, 1990) and asymmetries (Easterwood and Nutt, 1999) in analysts’ earnings estimates. Elliott et al. (2010) finds that analysts systematically underweight new information and underreact to their own previous revisions. Easterwood and Nutt (1999) finds that analysts underreact to negative information and overreact to positive information. Ciccone (2005) shows that forecast errors are different for profitmaking companies than lossmaking companies. Raedy et al. (2006) provides evidence that analysts are less likely to make revisions in the opposite direction of previous revisions as they incur a greater reputational cost for acknowledging they “overshot” in previous estimates. There is also evidence analysts engage in “herding behavior” (O’Neill et al., 2011), or are unlikely to deviate too much from other analysts. Hong et al. (2000) finds that analysts are more likely to engage in herding in their revisions when their outstanding estimates deviate significantly from other analysts.
Furthermore, there is evidence that biases and systematic errors are stronger in some analysts than others. Michaely and Womack (1999) finds that analysts with more experience have lower forecast error. Hong et al. (2000) finds that experienced analysts are also more likely to revise their estimates earlier and their estimates show more dispersion, indicating stronger conviction and willingness to issue bold forecasts that go against the trend. Mikhail et al. (1999) finds that earnings estimate accuracy is negatively correlated with analyst turnover. Finally, recent research has highlighted evidence of racial, gender and political biases towards out of group company CEOs (Jannati et al., 2019).
Many investors use the consensus earnings estimate, or the simple average of estimates from analysts at major institutions, as a reliable forecast of a company’s upcoming earnings. However, research has shown that inversely weighting these individual analysts’ estimates based on their historical forecasting errors results in a more accurate “adjusted” consensus estimate (Jha and Mozes, 2001; Michaely et al., 2018). While this is consistent with the finding that some analysts are more accurate than others, only considering the total forecasting error does not disentangle the systematic bias component of forecasting error from the unsystematic (or unpredictable) general error and does account for different types of biases and asymmetries in forecast errors. We hypothesize that a model which accounts for the systematic and asymmetric components of analyst estimate error will produce more robust adjusted consensus earnings estimates.
1.1 Primary contribution
We propose a Bayesian latent variable model for analysts’ earnings estimate forecasting error and show how it can be used to infer a robust adjusted consensus earnings estimate. In our model, we assume there are latent subgroups of analysts such that analysts within each group demonstrate similar systematic forecasting errors. We use historical analyst estimates of company earnings and actual reported earnings to learn the parameters of this model. We then describe a procedure for inverse inference to generate a robust consensus estimate of future earnings from individual analysts’ estimates. We believe robust earnings estimates benefit both investors, who require accurate forecasts of company financials, and public companies, whose stock prices may be undervalued as a result of some analysts’ incentive biases, conflicts of interest or for discriminatory reasons.
In the following sections, we describe the proposed Bayesian latent variable model and inverse inference procedure to generate robust company earnings estimates. We compare the resulting robust estimates to actual reported company earnings. We find that this approach produces estimates which are more accurate than the consensus estimate and other adjusted consensus baselines.
2 Proposed Model for Robust Earnings Estimates
The specific financial quantity we focus on modeling the forecast error of is Earnings per Share (EPS), as this most widely considered quantity when assessing the value of a public company. EPS is the ratio between company’s net income during a particular reporting period after subtracting preferred dividends and the number of outstanding common shares of that company’s stock.
We assume that there exists a latent subgrouping of analysts such that within each group, we observe similar EPS forecasting errors. We hypothesize that analysts’ estimates for changes
in EPS (the difference between the EPS forecast for the next period and the company’s reported EPS from the previous period) are normally distributed around a linear function of the actual resulting change in EPS with some heteroscedastic variance. Both the variance and the parameters of the linear function are conditioned on the latent subgroup an analyst belongs to. In this setting, the distribution of the
forwards model, or the observed forecasting error process, can be written in closed form and parameter learning can be carried out with a gradientbased method.2.1 The Forwards Model
For a particular reporting period, we use to represent the actual change in reported EPS from the previous period, where is an index over the set of companies, , and is an indicator of whether the change in EPS is positive or negative.
Now, for each analyst indexed by in the set of analysts,
, we draw a categorical variable
conditioned on the parameters that determines which one of the latent subgroups analyst belongs to. Finally we draw a set of parameters conditioned on the analyst subgroup that interact with the true change in EPS and . We show the model in plate notation in Figure 1 and give the explicit steps in the process below:
For all , Draw as,

For all , Draw as,

For all and ,
Draw as
(1)
2.2 Parameter Learning
To learn the parameters of the model from observed analyst estimates and actual reported EPS data , we want to maximize the likelihood, , which we can write as follows:
Here, is the Evidence Lower Bound, which is commonly used when carrying out Variational Inference with Graphical Models (Blei et al., 2017). In the vernacular of Variational Inference, our approximating distribution, of the True Posterior is the posterior distribution of given , .
2.3 Optimization
For parameter learning we utilize the popular First Order optimizer Adam (Kingma and Ba, 2015), which has a learning rate of . We do not perform any minibatching and stop optimization as soon as we overfit the validation set.
The ELBO as defined in the previous section is explicitly optimized in the following form:
2.4 Identification
For purposes of identification and to ensure convergence to a good local minimum, we fix the hyper parameters corresponding to the latent group as and . Thus, the semantic interpretation of this latent group is that estimates from this group are accurate and unbiased.
2.5 Parameter Initialization
For all subgroups , we initialize and
using the coefficient of Ridge regression estimates by regressing the estimates
on the corresponding set of changes in actual EPS . We further set the initial variance of each latent group to . We observe that in practice using these initial values leads to better convergence.3 Inverse Inference for Generating Robust Estimates
At test time, we want to infer a robust estimate for change in EPS from the analysts’ estimates and the learned model parameters . For a company , this is equivalent of inferring .
In our formulation, inference at test time is harder than parameter learning. This is primarily because the posterior over the latent variables is intractable. We can, however, express the conditional distributions of each variable in closed form, which allows us to use Gibbs Sampling, a Markov Chain Monte Carlo technique that allows inference by sampling from the conditional distributions, to overcome this challenge. Sampling from the full conditionals is easy for all of the variables except the changes in EPS actuals
. Proposition 1 gives the posterior distribution of given the analyst estimates, , and model parameters, , in closed form to allow sampling.Proposition 1
Under the DAG model assumptions in Figure 1, the Posterior Distribution of conditioned on its Markov Blanket, or set of variable such that is conditionally independent of all other variables in the model, is given as
where,
and,
Proof Sketch.
The Proof of the following proposition, involves adding a weak conjugate prior on
. Now,Rewriting in matrix form, we get
(2) 
Now, from Equation 2 and the result in Cepeda and Gamerman (2000, 2005)
pertaining to Bayesian Linear Regressions under Heteroscedasticity, we arrive at the posterior.
Algorithm 1 provides the steps in the Gibbs sampling procedure for inverse inference of by sampling from the full posterior conditionals.
We evaluate this procedure using historical EPS estimates and actuals in the next section.
4 Experiments
We evaluate the ability of our model and inverse inference procedure to generate robust consensus estimates by first learning parameters from historical EPS estimates and actuals and then carrying out inverse inference to predict changes in EPS actuals from test data estimates . We compare these estimates using our approach, which we refer to as Latent Bayesian Averaging (LBA), to the simple consensus estimates and other reference baselines, which we describe below.
4.1 Reference Baselines
We consider the following reference baselines to benchmark our approach:
No Adjustment (NA): The estimate of is the simple consensus estimate, or average of all of the analysts’ estimates .
Weighted Adjustment (WA): Instead of averaging over the estimates, naively, we perform a weighted averaging such that , i.e. the weight given to an analyst’s estimate is inversely proportional to the analyst’s historical forecast accuracy.
Regression Adjustment (RA): We regress the set of true values, against the corresponding estimates, . At test time, we perform the learnt regression on to get adjusted estimates for . The final estimate is the average of the adjusted estimates.
We consider two different regression functions, a parametric ridge regression RARidge
and a nonparametric regression consisting of an Random Forest of Decision Trees
RAEnsemble.where is the learnt regression function.
Bayesian Regression Adjustment (BA) : Instead of regressing the actual on the estimates, . We first learn a regression of on the actuals with a linear link function . At test time we condition on and place a weak conjugate prior on . The final adjusted estimate of is then recovered as the expectation of under the posterior conditioned on and .
Note that Bayesian Regression Adjustment is equivalent to our model when the number of latent groups .
4.2 Dataset
We use the Thompson Reuters’ Institutional Brokers Estimate System (I/B/E/S) Dataset which records estimated earnings forecasts of different analysts for different companies and upcoming periods across multiple time horizons. For our experiments we look at a smaller subset of the I/B/E/S data consisting of the top 200 companies followed by the most analysts over a 19 year period from January , 2000 to January , 2019. We consider forecasts at the horizons of the next Fiscal Year (FY1) and Second Fiscal Year (FY2). Some analysts have multiple revisions during this period. We only consider a revision if it was recorded at least 6 months (or 12 months) before the forecast period end date for the next Fiscal Year (Second Fiscal Year). We use the data from January , 2000 to January , 2012 for training, data from January , 2012 to January , 2014 for validation and data from January , 2014 to January , 2019 for testing.
5 Results
We consider the difference between the actual reported change in EPS and forecasted change using our method and each of the reference baselines. We report the micro averaged Root Mean Squared Error (RMSE), the Mean absolute Error (MAE) and the Coefficient of Determination (R) across all companies for Fiscal Year 1 in Table 1 and Fiscal Year 2 in Table 2. We also report the 95%CI which we generate by bootstrapping the inferred results for the test data points 1000 times. For completeness, we also report the RMSE and MAE values Macro Averaged over the individual companies.
From the results in Tables 1 and 2, it is evident that analysts are reasonably accurate in their predictions of future earnings, as evidenced by the low RMSE for the unadjusted consensus estimates (NA). However, as expected, we see that analysts tend to err more as the forecast horizon is increased, as is evident from the higher FY2 errors. Although Weighted Averaging (WA) reduces errors in the FY2
consensus estimates, this benefit is not significant given the large confidence intervals around the results. Interestingly, we observed that Regression Adjustment (
RA) reduced the consensus error by a large margin on the training dataset, but had worse performance on the test set. This was true for both Parametric Ridge Regression and NonParametric Random Forest Regression, suggesting that these models have a large tendency to overfit. Furthermore, amongst all the proposed baselines, Bayesian Adjustment BA has the highest errors (we do not report the R for BA for this reason). We hypothesize that this is because Bayesian Adjustment does not allow for the flexibility of discovering analysts who are unbiased. In contrast, our proposed Latent Bayesian Adjustment (LBA) reduces forecast error across all reported metrics for both FY1 and FY2 and the reductions are significant in each case, demonstrating its effectiveness as an improved consensus model.6 Discussion and Future Work
Biases and systematic errors in earnings forecasts can negatively impact both investors and public companies. Accurate, unbiased consensus earnings are important to investors to understand the financial health of companies and value of their stock so they can make wellinformed investment decisions. Similarly, if analysts’ estimates are affected by behavioral, incentivebased or discriminatory biases, this may result in companies’ stocks being undervalued. We proposed a Bayesian latent variable model and inverse inference procedure that we demonstrated produces estimates which are more robust than consensus estimates as well as other adjusted baselines.
There are a number of possible directions to pursue to further improve the model. Research has shown that analysts whose buy and sell recommendations are more profitable also produce more accurate estimates (Loh and Mianc, 2006). Adding analysts’ recommendations to the model might result in more robust identification of latent subgroups and more accurate estimates. Additionally, while our model incorporates the asymmetry in systematic errors for profitmaking and lossmaking companies, it does not incorporate other specific biases and asymmetries that have been identified, such as effects for different types of companies, investment banking relationships and discriminatory out of group effects. Additional data identifying some of these attributes for individual analysts and companies could further improve the model and make it more robust to these types of forecasting errors. Furthermore, the model we proposed is linear. Given the observed asymmetries in analysts’ systematic forecasting errors, a nonlinear model might further improve the estimation procedure.
While the focus of this paper is generating robust consensus earnings estimates, we note that the proposed model is applicable to any other problem where we have a quantity that is measured by multiple instruments or individuals, which may be subject to machine error or human subjectivity. There are many other close applications in finance and economics, such as GDP and unemployment forecasting, where this model may prove to be more robust than existing approaches. It might also prove useful in more distant applications like elections forecasting or combining sensor readings.
Disclaimer
This paper was prepared for information purposes by the AI Research Group of JPMorgan Chase & Co and its affiliates (“J.P. Morgan”), and is not a product of the Research Department of J.P. Morgan. J.P. Morgan makes no explicit or implied representation and warranty and accepts no liability, for the completeness, accuracy or reliability of information, or the legal, compliance, financial, tax or accounting effects of matters contained herein. This document is not intended as investment research or investment advice, or a recommendation, offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.
References
 Variational inference: a review for statisticians. Journal of the American Statistical Association 112 (518), pp. 859–877. Cited by: §2.2.
 Analysts’ forecasts: what do we know after decades of work?. SSRN Electronic Journal. External Links: Document Cited by: §1.
 Security analyst superiority relative to univariate timeseries models in forecasting quarterly earnings. Journal of Accounting and Economics 9 (1), pp. 61–87. Cited by: §1.
 The superiority of analyst forecasts as measures of expectations: evidence from earnings. Journal of Finance 33 (1), pp. 1–16. Cited by: §1.
 Analyst forecasting errors: additional evidence. Financial Analysts Journal 53 (6), pp. 81–88. Cited by: §1.

Bayesian modeling of variance heterogeneity in normal regression models.
Brazilian Journal of Probability and Statistics
14 (1), pp. 207–221. Cited by: Proposition 1.  Bayesian methodology for modeling parameters in the two parameter exponential family. Revista Estadística 57 (168169), pp. 93–105. Cited by: Proposition 1.
 Trends in analyst earnings forecast properties. International Review of Financial Analysis 14 (1), pp. 1–22. Cited by: §1.
 Do security analysts overreact?. The American Economic Review 80 (2), pp. 52–57. Cited by: §1, §1.
 Earnings predictability and the direction of analysts’ earnings forecast errors. The Accounting Review 78 (3), pp. 707–724. Cited by: §1.
 Inefficiency in analysts’ earnings forecasts: systematic misreaction or systematic optimism?. The Journal of Finance 54 (4), pp. 1777–1797. Cited by: §1.
 Evidence from archival data on the relation between security analysts’ forecast errors and prior forecast revisions. Contemporary Accounting Research 12 (2), pp. 919–938. Cited by: §1, §1.
 Analysts’ decisions as products of a multitask environment. The Journal of Accounting Research 31 (2), pp. 216–230. Cited by: §1.
 Financial analysts’ forecasts of earnings: a better surrogate for market expectations. Journal of Accounting and Economics 4 (2), pp. 85–107. Cited by: §1.
 Security analysts’ careeer concerns and herding of earnings forecasts. RAND Journal of Economics 31 (1), pp. 121–144. Cited by: §1, §1.
 Analyzing the analysts: career concerns and biased earnings forecasts. The Journal of Finance 58 (1), pp. 313–351. Cited by: §1.
 Ingroup bias in financial markets. SSRN Electronic Journal. External Links: Document Cited by: §1.
 Creating and profiting from more accurate earnings estimates with starmine professional. StarMine white paper. Cited by: §1.
 Adam: a method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. Cited by: §2.3.
 Do accurate earnings forecasts facilitate superior investment recommendations?. Journal of Financial Economics 80 (2), pp. 455–483. Cited by: §6.
 Lured by the consensus: the implications of treating all analysts as equal. SSRN Electronic Journal. External Links: Document Cited by: §1.
 Conflict of interest and the credibility of underwriter analyst recommendations. The Review of Financial Studies 12 (4), pp. 653–686. Cited by: §1, §1.
 Market efficiency and biases in brokerage recommendations. In Advances in Behavioral Finance, Vol. 2, R. H. Thaler (Ed.), pp. 389–419. Cited by: §1.
 Does forecast accuracy matter to security analysts?. The Accounting Review 74, pp. 185–200. Cited by: §1.
 Analyst impartiality and investment banking relationships. The Journal of Accounting Research 43 (4), pp. 623–650. Cited by: §1.
 How does prior information affect analyst forecast herding?. Academy of Accounting and Financial Studies Journal 15, pp. 105–128. Cited by: §1.
 Horizondependent underreaction in financial analysts’ earnings forecasts. Contemporary Accounting Research 23 (1), pp. 291–322. Cited by: §1.
 Tracking analysts’ forecasts over the annual earnings horizon: are analysts’ forecasts optimistic or pessimistic?. SSRN Electronic Journal. External Links: Document Cited by: §1.
Comments
There are no comments yet.