Most people take decisions in an uncertain environment without resorting to formal statistical analysis (Tversky and Kahneman, 1974). I refer to these decisions as judgmental decisions. Statistical decision theory uses data to prescribe optimal choices under a set of assumptions (Wald, 1950), but has no explicit role for judgmental decisions. This paper is concerned with the following questions: Is a given judgmental decision optimal in the light of empirical evidence? If not, how can it be improved?
The answer to the first question is obtained by testing whether, for a given loss function, the first derivative evaluated at the judgmental decision is equal to zero. The answer to the second question is derived from the closest boundary of the confidence interval. The decision rule incorporating judgment is admissible and does not perform worse than the judgmental decision with a probabilty equal to the confidence level. The implication is that abandoning a judgmental decision to follow a statistical procedure always carries the risk of choosing an action worse than the original judgmental decision. This may happen with a probability bounded above by the confidence level.
For concreteness, consider an investor who is about to take the judgmental decision
, say, to hold all her assets in cash. She asks an econometrician for advice on whether she should invest some of her money in a stock market index. The best prediction of the econometrician depends on an estimated parameter, which is affected by estimation risk. For a given utility function provided by the investor, the econometrician can construct a loss function , the loss experienced by the investor if the decision is taken and the true parameter is . Suppose the econometrician is able to recover the distribution of the gradient around the true, but unknown . It is possible to test whether the investor’s decision
is optimal by testing the null hypothesis thatis equal to zero. If the null hypothesis is not rejected, the econometrician cannot recommend any deviation from . If the null hypothesis is rejected, statistical evidence suggests that marginal deviations from decrease the loss function relative to .
Denote with the confidence level used to implement the hypothesis testing. The investor is facing the decision problem depicted in figure 1. The investor has two possible choices. She can hold on to her judgmental decision , denoted by the action , incurring in the loss . Alternatively, she can follow the econometrician’s advice, which is equivalent to accepting the bet . In this case, she does not know whether she is facing the upper part of the decision tree, denoted by the node , or the lower part, denoted by . is the unfavorable scenario, in which the null hypothesis is true, so that any deviation from the judgmental decision results in a higher loss. A marginal move away from results in the loss . is the favorable scenario, as one correctly rejects the null hypothesis that is optimal, producing decisions with lower loss. In this case, a marginal move away from results in the loss . The dash line connecting the two nodes represents true uncertainty for the decision maker, in the sense that it is not possible to attach any probability to being in or in . The decision maker can choose the confidence level , which puts an upper bound to the probability that the null is wrongly rejected when it is true. Notice that represents also the lower bound probability of correctly rejecting when it is false.
In case of rejection, the investor faces a new, but identical decision problem, except that is replaced by (the sign depends on the sign of the empirical gradient). This new action will be rejected if also falls in the rejection region. Iterating this argument forward, the preferred decision of the investor is the action which lies at the boundary of the -confidence interval of , the point where the null hypothesis that the decision is optimal can no longer be rejected. This decision is characterized by the fact that it will produce a loss higher than the original judgmental decision with probability at most .
The contribution of this paper lies at the intersection between statistics and decision theory. Statistical decision theory emerged as a discipline in the 1950’s with the works of Wald (1950) and Savage (1954). Recent contributions in decision theory focus on modeling behavior when beliefs cannot be quantified by a unique Bayesian prior (Gilboa and Marinacci, 2013) and on models of heuristics describing how people arrive at judgmental decisions (Gennaioli and Shleifer, 2010). This paper, however, is not concerned with the axiomatic foundations of decision theory, but rather with how data can be used to help decision makers improve their judgmental decisions. It falls within Clive Granger’s tradition that‘to obtain any kind of best value for a point forecast, one requires a criterion against which various alternatives can be judged’
(Granger and Newbold, 1986, p. 121; see also Granger and Machina, 2006). Recent contributions within this tradition are Patton and Timmermann (2012) and Elliott and Timmermann (2016). Other contributions include Chamberlain (2000) and Geweke and Whiteman (2006), who deal with forecasting using Bayesian statistical decision theory, and Manski (2013 and the references therein), who uses statistical decision theory in the presence of ambiguity for partial identification of treatment response.
The paper is structured as follows. Section 2 sets up the decision environment and introduces the concept of judgment in frequentist statistics. Judgment is defined as a pair formed by a judgmental decisionand a confidence level . Judgment is used to set up the hypothesis to test whether the action is optimal. Two key results of this section are that the decision rule incorporating judgment is admissible, and that it is either the judgmental decision itself or is at the boundary of the confidence interval of the sample gradient of the loss function.
Section 3 discusses the choice of the confidence level . As illustrated in figure 1, the confidence level puts an upper bound to the probability that the statistical decision rule performs worse than the judgmental decision. The confidence level can therefore be interpreted as the willingness of the decision maker to take statistical risk and is referred to as the coefficient of statistical risk aversion. This concept is closely linked to the idea of ambiguity aversion. The section also discusses how the confidence level can be elicited with a simple experiment involving urns à la Ellsberg.
Section 4 uses an asset allocation problem as a working example to illustrate the empirical performance of various decision rules. Section 5 concludes.
2 Statistical Decision Rules with Judgment
This section introduces the concept of judgment and shows how hypothesis testing can be used to arrive at optimal decisions. For concreteness, I solve a simple asset allocation problem, but the example can be easily generalized.
Consider an investor holding cash, yielding zero nominal returns. The objective is to minimize a loss function, by deciding what fraction to invest in a stock market index, yielding the uncertain return . The decision environment is formally defined as follows.
Definition 2.1 (Decision Environment).
Let denote the cdf of the standard normal distribution. The decision environment is defined by:
denote the cdf of the standard normal distribution. The decision environment is defined by:
, where is unknown.
One sample realization is observed.222
I denote random variables with upper case letters () and their realization with lower case letters ().
denotes the action of the decision maker.
The decision maker minimizes the loss function .
Remark: General case — This decision environment can be generalized to cover any continuously differentiable and strictly convex loss function, at the cost of more cumbersome notation. The intuition is the following. Since the main object of interest is the first derivative of the loss function evaluated at and at the maximum likelihood estimator , an approximation of the first order conditions around the population parameter gives:
The statistical properties of the gradient can therefore be deduced from the statistical properties of . The strict convexity of the loss function guarantees that there is a one to one mapping between and the gradient (although not linear as in the decision environment above).
Consider the following standard definition of a decision rule (Wald, 1950):
Definition 2.2 (Decision Rule).
is a decision rule, such that if is the sample realization, is the action that will be taken.
Classical statistics as developed by Neyman and Fisher has no explicit role for epistemic uncertainty (as defined by Marinacci, 2015), as it was motivated by the desire for objectivity. Non sample information is, nevertheless, implicitly introduced in various forms, in particular in the choice of the confidence level and the choice of the hypothesis to be tested.
I introduce the following definition of judgment.
Definition 2.3 (Judgment).
Judgment is the pair . is the judgmental decision. is the confidence level.
Judgment is routinely used in hypothesis testing, for instance when testing whether a regression coefficient is statistically different from zero (with zero in this case playing the role of the judgmental decision), for a given confidence level (usually 1%, 5% or 10%). I say nothing about how the judgmental decision is formed. This question is explored by Tversky and Kahneman (1974) and subsequent research. The choice of the confidence level is discussed in section 3. For the purpose of this paper, judgment is a primitive to the decision problem, like the loss function.
2.2 Hypothesis Testing
The decision maker can test whether is optimal by testing if the gradient
is equal to zero. A test statistic for the gradient can be obtained by replacingwith its maximum likelihood estimator .
The novel insight of this paper stems from the realization that the hypothesis to be tested should be conditional on the sample realization . Having observed a negative, say, sample gradient , one can conclude that values of higher than decrease the empirical loss function. The decision maker is interested, however, in the population value of the loss function. If the population gradient is positive, higher values of would increase the loss function, rather than decrease it. Analogous, but opposite considerations hold if the sample gradient is positive. The null hypothesis to be tested is therefore that the population gradient has opposite sign relative to the sample gradient. For a discussion of the importance of conditioning in statistics, see chapter 10 of Lehmann and Romano (2005) or section 1.6.3 of Berger (1985) and the references therein.
To formalize, partition the sample space according to the sign taken by the sample gradient as follows:
Two cases are possible:
i) , implying that the null hypothesis to be tested is:
ii) , implying that the null hypothesis to be tested is:
In an hypothesis testing decision problem, only two actions are possible: The null hypothesis is either accepted or rejected. Let and , and consider again the two cases, conditional on the sample realization . Given the judgment define the test functions associated with the hypotheses (3)-(4):
The following theorem derives the decision compatible with judgment:
Proof — See Appendix.
The decision rule (7) depends not only on the random variable , but also on the sample realization . To understand the intuition, consider the case i) and the associated null hypothesis . The null hypothesis is a statement about the population gradient evaluated at the judgmental decision . It says that marginally higher values of do not decrease the loss function. If it is not rejected at the given confidence level , the chosen action must be . Rejection of the null hypothesis, on the other hand, implies accepting the alternative, which states that marginally higher values of decrease the loss function. Denote the new action marginally away from with , for and sufficiently small. Notice that is not random and it is possible to test whether it is optimal, by testing again whether additional marginal moves from increase the loss function. This reasoning holds for all null hypotheses for any . The first null hypothesis which is not rejected is , where .
The next theorem shows that this decision cannot be improved.
(Admissibility) The decision of Theorem 2.1 is admissible.
Proof — See Appendix.
The admissibility result is a direct consequence of Karlin-Rubin theorem applied to the test functions (5)-(6). It follows from the fact that the randomness of the decision rule (7) stems from the indicator functions determining the sign of the gradient and from the (conditional) test functions , . The actions to be taken in case of rejection ( or ) or non rejection () are not random.
3 Choosing the Confidence Level
The confidence level determines the willingness of the decision maker to take statistical risk and therefore I equivalently refer to it as the coefficient of statistical risk aversion. The intuition follows from the decision tree of figure 1. A decision maker facing a statistical decision problem is about to take the judgmental decision . The econometrician suggests a statistical decision rule, which by its random nature may perform worse than . The choice of puts an upper bound to the probability that the statistical decision rule may perform worse than .
This intuition is formalized by the following theorem.
Theorem 3.1 (Economic interpretation of the confidence level).
Proof — See Appendix.
An extremely statistical risk averse decision maker chooses . A zero confidence level results in a degenerate confidence interval which coincides with the entire real line. As a consequence, it is never possible to reject the null hypothesis that the judgmental decision is optimal. At the other extreme, a statistical risk loving decision maker chooses . When the confidence interval degenerates into a single point, which coincides with the maximum likelihood decision. In this case, the null hypothesis that is optimal is always rejected and the decision maker is fully exposed to the possibility that the statistical decision rule will perform worse than the judgmental decision. An intermediate case of statistical risk aversion is represented by the subjective classical estimator of Manganelli (2009), which sets .
The degree of statistical risk aversion can be elicited with an experiment à la Ellsberg (1961) where the decision maker has to choose among different couples of urns. Accepting the advice of an econometrician is like accepting a bet with Nature where the probabilities of the payoff are only partially specified.
Consider two urns with 100 balls each. Urn 1 contains only white and black balls, Urn 2 contains white and red balls. If the black ball is extracted, the respondent loses €100. If the red ball is extracted, the respondent wins an amount in euros which gives an increase in utility equivalent to the reduction in utility produced by the loss of €100. If the white ball is extracted, nothing happens. The respondent can choose among the composition of the urns described in table 1. By accepting one of the bets from 1 to 99, she can control the upper bound probability of losing in case balls are drawn from Urn 1. By choosing this upper bound probability, she automatically chooses the lower bound probability of winning in case the ball is drawn from Urn 2.
|Urn 1||Urn 2|
Note: The decision maker can choose one of the bets from to . She will face Urn 1 or Urn 2 with unknown probability. If a white ball is extracted, nothing happens. If a black ball is extracted, the decision maker loses €100. If a red ball is extracted, she wins a utility equivalent euro amount. Urn 1 and Urn 2 correspond, respectively, to the nodes and of the decision tree in figure 1. The decision maker can partially choose the composition of the urns. For instance, by choosing bet 2, she knows that Urn 1 does not contain more than 2 black balls and Urn 2 contains at least 2 red balls.
To understand the link with the statistical decision problem, consider again figure 1. Urn 1 corresponds to node in the upper part of the decision tree in figure 1. Urn 2 corresponds to node in the lower part of the decision tree. The worst case scenario is when the null hypothesis is true, as in this case deviations from increase the loss. However, even in this case, according to the decision rule (7) there is still the possibility that the null hypothesis is not rejected, in which case the chosen action is . The choice of the confidence level controls the probability of wrongly rejecting the null. When the null hypothesis is true, it is like having the ball extracted from Urn 1, and choosing is like choosing the maximum number of black balls contained in Urn 1. The favorable scenario is when the conditional null hypothesis is false. In this case, rejection of the null leads to the choice of a better action, in the sense that it produces a lower loss. When the null hypothesis is false, it is like having the ball extracted from Urn 2. The probability of correctly rejecting the null depends on the power of the test, but is in any case greater than the chosen confidence level . Choosing is like choosing the minimum number of red balls contained in Urn 2.
In real world situations, one does not know whether the null hypothesis is true or not, which represents genuine uncertainty and is indicated by the dashed line in figure 1. This is like saying to the participants in the experiment that it is unknown from which urn the ball will be extracted. An extremely statistical risk averse player would always choose not to participate to the bet and retain the judgmental decision , a choice corresponding to bet 0 in the table. A statistical risk loving player would choose bet 100. In general, players with higher degrees of statistical risk aversion would choose bets with lower numbers.
4 An Asset Allocation Example
This section implements the decision with judgment, solving a standard portfolio allocation problem.
The empirical implementation of the mean-variance asset allocation model introduced by Markowitz (1952) has puzzled economists for a long time. Despite its theoretical success, it is well-known that plug-in and Bayesian estimators of the portfolio weights produce volatile asset allocations which usually perform poorly out of sample, due to estimation errors (Jobson and Korkie 1981, Brandt 2007). This paper takes a different perspective on this problem. The decision with judgment provides an asset allocation which does not perform worse than any given judgmental allocation with a probability equal to the confidence level.
To implement the statistical decision rules, I take a monthly series of closing prices for the EuroStoxx50 index, from January 1999 until December 2015. EuroStoxx50 covers the 50 leading Blue-chip stocks for the Eurozone. The data is taken from Bloomberg. The closing prices are converted into period log returns. Table 2 reports summary statistics.
|Note: Summary statistics of the monthly returns of the EuroStoxx50 index from January 1999 to December 2015.|
The exercise consists of forecasting the next period optimal investment in the Eurostoxx50 index of a person who holds €100 cash. I take the first 7 years of data as pre-sample observations, to estimate the optimal investment for January 2006. The estimation window then expands by one observation at a time, the new allocation is estimated, and the whole exercise is repeated until the end of the sample.
To directly apply the decision with judgment as discussed in section 2
, which assumes the variance to be known, I transform the data as follows. I first divide the return series of each window by the full sample standard deviation, and next multiply them by the square root of the number of observations in the estimation sample. Denoting bythe original time series of log returns, let be the full sample standard deviation and the size of the first estimation sample. Then, for each , , define:
I ‘help’ the estimates by providing the full sample standard deviation, so that the only parameter to be estimated is the mean return. Under the assumption that the full sample standard deviation is the population value, by the central limit theoremis normally distributed with variance equal to one and unknown mean. I can therefore implement the decision rule with judgment, using the single observation for each period .
Figure 2 reports the portfolio values associated with different decision rules. For comparison, I have also included a Bayesian decision with a standard normal prior. Suppose the starting value of the portfolio in January 2006 is €100. By the end of the sample, after 10 years, an investor using the maximum likelihood decision rule would have lost one quarter of the value of her portfolio. The situation is slightly better with the Bayesian decision rule, as it delivers a loss of around 12%. The decision with judgment at confidence level at 1% does not lose anything because it never predicts deviating from the judgmental allocation of holding all the money in cash.
The point of this discussion is not to evaluate whether one decision rule is better than the other, as the decision rules differ only with respect to the choice of the confidence level and the prior, which are both a subjective choice. The purpose is rather to illustrate the implications of choosing different confidence levels. By choosing the maximum likelihood and Bayesian estimators, one has no control on the statistical risk she is going to bear. The decision with judgment, instead, allows the investor to choose a constant probability of underperforming the judgmental allocation: She can be confident that the resulting asset allocation is not worse than the judgmental allocation with the chosen probability. The case of the EuroStoxx50, however, represents only one possible draw, which turned out to be particularly adverse to the maximum likelihood and Bayesian estimators. Had the resulting allocation implied positive returns by the end of the sample, maximum likelihood and Bayesian estimators would have outperformed the decisions with judgment. There is no free lunch: Decision rules with higher statistical risk aversion produce allocations with greater protection to underperformance relative to the judgmental allocation, but also have lower upside potential. In statistical jargon, lower confidence levels protect the decision maker from Type I errors, but imply higher probabilities of Type II errors.
Judgment plays an important role not just for individuals, but also in policy institutions. Most policy decisions are shaped by the judgment of policy makers. When advising a policy maker, the econometrician can test whether the preferred judgmental decision is supported by data. If not, the decision incorporating judgment is always at the closest boundary of the confidence interval. The probability of obtaining higher losses than those implied by the judgmental decision is bounded by the given confidence level.
The confidence level reflects the attitude of the decision maker towards statistical uncertainty. I have referred to it as the coefficient of statistical risk aversion. It can be elicited with experiments involving urns à la Ellsberg. Decision makers characterized by an exteme form of statistical risk aversion (a confidence level equal to 0) always follow their own judgmental decision and ignore the advice of the econometrician. At the other extreme, statistical risk loving decision makers (with a confidence level equal to 1) ignore their judgment and always follow the econometrician’s advice, which in this special case corresponds to the maximum likelihood decision. Policy makers are likely characterized by high, but not extreme, coefficients of statistical risk aversion. The framework provided in this paper to measure it may have profound policy implications.
Appendix — Proofs
Proof of Theorem 2.1 — Consider only the case i) . The other case can be proven in a similar way. If , the null hypothesis is not rejected at the given confidence level . is therefore retained as the chosen action.
If , the null hypothesis is rejected. Rejection of the null implies acceptance of the alternative hypothesis , that is marginal moves away from by a sufficiently small amount decrease the loss function.
Consider now the family of null hypotheses for . Define also the family of rejection regions . Clearly, for any , that is the null hypothesis is not rejected at the confidence level for any .
Denote with the chosen action and suppose that . If , this implies that , that is is rejected. Therefore this decision cannot be optimal.
If , continuity implies that it exists such that the null was rejected at the given confidence level , even though , which implies a contradiction.
The chosen action must therefore be .
Proof of Theorem 2.2 — The risk function of a generic decision is:
and the expectations are taken with respect to the truncated normal distribution.
Consider equation (10). The case for equation (11) can be proven similarly. I prove that the decision is admissible with respect to the truncated normal distribution. This implies that for all . Since the same holds for , these two results together imply that and therefore that is admissible.
To prove that is admissible with respect to the truncated normal distribution, I verify that the conditions of theorem 4 of Karlin and Rubin (1956) hold. First, note that the truncated normal distribution belongs to the exponential family and therefore possesses a monotone likelihood ratio (see section 1 of Karlin and Rubin, 1956). Second, conditional on observing , the decision rule of theorem 2.1 foresees two actions: either the null hypothesis (3) is accepted or rejected. Denote these actions with and , respectively. Define the corresponding losses from the original loss function:
and note that
This function is linear in and therefore changes sign only once as a function of , specifically at the finite value . Since is a monotone procedure, the conditions of theorem 4 of Karlin and Rubin (1956) are satisfied and is admissible.
Proof of Theorem 3.1 — Partitioning the sample space with respect to the gradient :
Consider again only the case i) , as the other one is similar.
Let us find out first the values of for which . This is equivalent to finding out when the function is positive, which it is for and . Therefore:
and note also that .
Suppose first that . Substituting the decision rule and rearranging the terms, the term (12) is equal to:
because , and the term (13) is equal to:
where the inequality follows from the fact that the case currently analyzed is .
Combining all these results gives:
Berger, J. O. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer-Verlag.
Brandt, M.W. (2009), Portfolio Choice Problems, in Y. Ait-Sahalia and L. P. Hansen (eds.), Handbook of Financial Econometrics, North Holland.
Chamberlain, G., (2000), Econometrics and decision theory, Journal of Econometrics, 95 (2), 255-283.
G. Elliott and A. Timmermann (2016), Economic Forecasting, Princeton University Press.
Ellsberg, D. (1961), Risk, Ambiguity, and the Savage Axioms, The Quarterly Journal of Economics, 75 (4), 643–669.
Gennaioli, N. and A. Shleifer (2010), What Comes to Mind, The Quarterly Journal of Economics, 125 (4), 1399–1433.
Gilboa, I. and M. Marinacci (2013), Ambiguity and the Bayesian Paradigm, in Advances in Economics and Econometrics: Theory and Applications, Tenth World Congress of the Econometric Society, D. Acemoglu, M. Arellano and E. Dekel (Eds.), New York, Cambridge University Press.
Geweke, J. and C. Whiteman (2006), Bayesian Forecasting, in Handbook of Economic Forecasting, Volume I, edited by G. Elliott, C. W. J. Granger and A. Timmermann, Elsevier.
Granger, C.W.J. and M.J. Machina (2006), Forecasting and decision theory, in G. Elliott, C. Granger and A. Timmermann (eds.), Handbook of Economic Forecasting, vol.1, 81-98, Elsevier.
Granger, C.W.J. and P. Newbold (1986), Forecasting Economic Time Series, Academic Press.
Jobson, J.D. and B. Korkie (1981), Estimation for Markowitz Efficient Portfolios, Journal of the American Statistical Association, 75, 544-554.
Karlin, S. and H. Rubin (1956), The Theory of Decision Procedures for Distributions with Monotone Likelihood Ratio, The Annals of Mathematical Statistics, 27, 272-299.
Lehmann, E.L. and J.P. Romano (2005), Testing Statistical Hypothesis, Springer.
Manganelli, S. (2009), Forecasting with Judgment, Journal of Business and Economic Statistics, 27 (4), 553-563.
Manski, C.F. (2013), Public policy in an uncertain world: analysis and decisions, Harvard University Press.
Markowitz, H.M. (1952), Portfolio Selection, Journal of Finance, 39, 47-61.
Marinacci, M. (2015), Model Uncertainty, Journal of European Economic Association, 13, 998-1076.
Savage, L.J. (1954), The Foundations of Statistics, New York, John Wiley & Sons.
Patton, A.J. and A. Timmermann (2012), Forecast Rationality Tests Based on Multi-Horizon Bounds, Journal of Business and Economic Statistics, 30 (1), 1-17.
Tversky, A. and D. Kahneman (1974), Judgment under Uncertainty: Heuristics and Biases, Science, 1124-1131.
Wald, A. (1950), Statistical Decision Functions, New York, John Wiley & Sons.