1.1 Mean vs. median in classical statistics
The two most prominent measures for the location of a distribution are the mean and the median. Both of them have a clear and accessible interpretation. While they coincide for symmetric distributions, they can considerably differ for asymmetric ones. From an estimation point of view the difference between the two measures are even more pronounced: The population mean is sensitive with respect to the underlying distribution, and—for symmetric distributions—it is a more efficient location-estimator than the median for light-tailed distributions(Koenker and Basset, 1978)
. On the other hand, the median is esteemed for its robustness against outliers, and—again for symmetric distributions—it turns out to be a more efficient location-estimator than the mean for heavy-tailed distributions (ibidem). Indeed, one can show that the maximum likelihood estimator for location coincides with the sample mean if the underlying distribution is normal whereas it amounts to the sample median in case of a Laplace distribution (Keynes, 1911). Moreover, the median of a distribution always exists while the existence of the mean requires a benign tail behaviour of the distribution.
The field of robust statistics was started off by the seminal contributions of Tukey (1960) and Huber (1964). Hampel (1971) was the first to formalise the notion of robustness and to link it to a continuity property of the estimator. Besides this qualitative definition of robustness, Hampel also introduced the breakdown point of an estimator as a quantitative measure of robustness. In finite samples (Donoho and Huber, 1983), it roughly amounts to the proportion of data that can be changed without corrupting the estimator. As a consequence, the median is robust with a breakdown point of 1/2, whereas the mean achieves a breakdown point of 0 rendering it non-robust. Since the early days of robust statistics, the field has developed an incredibly rich strand of literature. For a thorough introduction we refer the reader to the excellent textbook Huber and Ronchetti (2009).
It is well known that both the mean and the median can be expressed as -estimators using the squared loss or the absolute loss, respectively. The most prominent compromise between the mean and the median in form of an -estimator is given by the famous Huber loss (Huber, 1964, p. 79). Historically older alternatives of such a compromise are the -trimmed mean and the -Winsorized mean, belonging to the class of -estimates. On a population level, the -trimmed mean, , is the average of all -quantiles for (see Section 2 for precise definitions). In a finite sample, it amounts to removing the smallest and the largest -fraction of all observations and then computing the mean with the remaining -fraction of observations. The -Winsorized mean instead calculates the mean over all observations, with the smallest (largest) -fraction set to be the empirical -quantile (-quantile). As such, the -trimmed mean and the -Winsorized mean constitute two natural interpolations between the mean () and the median (). They are robust, with a breakdown point of (Hampel, 1971).
1.2 Expected Shortfall vs. Value at Risk in risk management
In the field of quantitative risk management, the last one or two decades have seen a lively debate about which monetary risk measure (Artzner et al., 1999) be best in (regulatory) practice. The debate mainly focused on the dichotomy between Value at Risk () on the one hand and Expected Shortfall () on the other hand at some level (see Section 2 for definitions). Interestingly, and in line with the debate in classical statistics, this encompasses a joust between a quantile () and a tail expectation (). We refer the reader to Embrechts et al. (2014) and Emmer et al. (2015) for comprehensive academic discussions and to Bank for International Settlements (2014) for a regulatory perspective in banking.
Cont et al. (2010) considered the issue of statistical robustness of risk measure estimates in the sense of Hampel (1971). They showed that a risk measure cannot be both robust in the latter sense and coherent in the sense of Artzner et al. (1999). As a compromise, they propose the risk measure ‘Range Value at Risk’ (), which is akin to an asymmetric version of the trimmed mean: One takes the average of all quantiles between two extreme levels , rather than between two symmetric levels where (see Section 2 for definitions). Setting renders and leads to . The arguments provided in Huber and Ronchetti (2009, p. 59) imply that has a breakdown point of , which means it is a robust—and hence, not coherent—risk measure, unless it degenerates to (or if ). Moreover, belongs to the wide class of distortion risk measures (Kusuoka, 2001). For further contributions to robustness in the context of risk measures, we refer the reader to Krätschmer et al. (2012, 2014), Kou et al. (2013), Embrechts et al. (2015) and Zähle (2016). Since the influential article Cont et al. (2010), RVaR has gained increasing attention in the risk management literature—see Embrechts et al. (2018a, b) for extensive studies—as well as in econometrics (Barendse, 2017) where RVaR sometimes has the alternative denomination Interquantile Expectation.
The property of a statistical functional to have an -estimator on the population level has become known as elicitability (Osband, 1985; Lambert et al., 2008; Gneiting, 2011). More specifically, we say that a functional is elicitable if there is a scoring function such that . Vice versa, a scoring function is called strictly consistent for if its expectation is uniquely minimised in at . Examples for elicitable functionals are given by the mean with and the median with and their asymmetric versions, expectiles and quantiles. From a game theoretic point of view, strict consistency of a scoring function amounts to incentive compatibility, rewarding truthful and honest forecasts. Besides its importance for -estimation and regression, e.g. quantile regression (Koenker and Basset, 1978; Koenker, 2005) or expectile regression (Newey and Powell, 1987), the notions of elicitability and strict consistency are crucial for forecast evaluation (Engelberg et al., 2009; Murphy and Daan, 1985)
. If forecasts take the form of probability distributions or densities, one often uses the term scoring rule rather than scoring function and propriety rather than consistency(Gneiting and Raftery, 2007).
showed that convex level sets (CxLS) of a functional are necessary for its elicitability. This shows that variance is generally not elicitable, and also Expected Shortfall fails to have the CxLS-property(Weber, 2006; Gneiting, 2011). Steinwart et al. (2014) showed that for continuous one-dimensional functionals, the CxLS-property is basically also sufficient for elicitability; cf. Lambert (2013), Bellini and Bignozzi (2015), Delbaen et al. (2016), as well as Heinrich (2014) for the role of the continuity assumption. The revelation principle (Osband, 1985; Gneiting, 2011)
asserts that any bijection of an elicitable functional is elicitable. This implies that the pair (mean, variance)—being a bijection of the first two moments—is elicitable despite the variance fails to be elicitable. Similarly,Fissler and Ziegel (2016) showed that the pair is elicitable with the structural difference that the revelation principle is not applicable in this instance. This gave rise to the finding that the minimal expected score and its minimiser are jointly elicitable; see Frongillo and Kash (2015) and Brehmer (2017).
In the context of quantitative finance and particularly in the debate about which risk measure is best in practice, elicitability has gained considerable attention (Emmer et al., 2015; Ziegel, 2016; Davis, 2016). Especially, the role of elicitability for backtesting purposes has been highly debated (Gneiting, 2011; Acerbi and Székely, 2014, 2017; Fissler et al., 2016; Nolde and Ziegel, 2017).
1.4 Elicitability of Range Value at Risk
and the CxLS property of the pair implies the CxLS property of the triplet (Wang and Wei, 2018, Example 7), leading to the question whether this triplet is elicitable or not. Invoking the elicitability of , the identity at (1.1) and the revelation principle establishes the elicitability of the quadruples and . This approach has already been used in the context of regression in Barendse (2017).
A fortiori, we show that the triplet is elicitable (Theorem 3.4) under weak regularity conditions. Besides the obvious advantage that this reduces the elicitation complexity (Lambert et al., 2008; Frongillo and Kash, 2015) or elicitation order (Fissler and Ziegel, 2016), it is particularly superior since , , exists for any distribution , while and only exist if the (left) tail of the distribution is integrable. Since is used often for robustness purposes, safeguarding against outliers and heavy-tailedness, the latter advantage becomes particularly important.
We would like to point out the structural difference between the elicitability result of provided in this paper and the one concerning in Fissler and Ziegel (2016) as well as the more general results of Frongillo and Kash (2015) and Brehmer (2017). While corresponds to the negative of a minimum of an expected score which is strictly consistent for , it turns out that can be represented as the difference of minima of strictly consistent scoring functions for and , respectively (Lemma 3.3). As a consequence, the class of strictly consistent scoring functions for the triplet turns out to be less flexible than the one for ; see Remark 3.9 for details. One particular implication is that there are essentially no strictly consistent scoring functions for which are also translation invariant or positively homogeneous; see Section 4.
The paper is organised as follows. In Section 2, we introduce the relevant notation and definitions concerning RVaR, scoring functions and elicitability. The main results establishing the elicitability of the triplet (Theorems 3.4 and 3.7) and related findings are presented in Section 3. Section 4 shows that there are basically no strictly consistent scoring functions for which are positively homogeneous or translation invariant. In Section 5, we establish a mixture representation of the strictly consistent scoring functions in the spirit of Ehm et al. (2016). This result allows to compare forecasts simultaneously with respect to all consistent scoring functions in terms of Murphy diagrams. We demonstrate the applicability of our results and compare the discrimination ability of different scoring functions in a simulation study presented in Section 6. The paper finishes in Section 7 with a discussion of our results in the context of -estimation and compares them to other suggestions in the statistical literature, in variants of a trimmed least squares procedure (Koenker and Basset, 1978; Ruppert and Carroll, 1980; Rousseeuw, 1984). A list of assumptions similar to the ones used in Fissler and Ziegel (2016) can be found in the Appendix.
2 Notation and Definitions
2.1 Definition of Range Value at Risk
We would like to recall that there are different sign conventions in the literature about risk measures. In this paper we use the following convention: If a random variablemodels the losses and gains, then positive values of represent gains and negative values of losses. Since we consider law-invariant risk measures only, thus defining risk measures directly as functionals of the distribution of , corresponding comments apply. Moreover, if is a risk measure, we assume that corresponds to the maximal amount of money one can withdraw such that the position is still acceptable. Hence, negative values of correspond to risky positions.
Definition 2.1 (Value at Risk).
Let be a probability distribution function on . For any we define the Value at Risk of at level via
Moreover, we use the common convention that corresponds to the infimum of the support of and that is defined as the supremum of the support of .
Definition 2.2 (Range Value at Risk).
Let be a probability distribution function on . For we define the Range Value at Risk of at levels via
Note that our parametrisation of differs from the one in Embrechts et al. (2018b). One can verify that if . For , and it is finite if and only if . Similarly, for it holds that and it is finite if and only if . exists only if or . If has a finite first moment, then coincides with the first moment of .
One can generalise this identity for and obtains the alternative representation
where we used the usual convention that , and . If is continuous at its - and -quantiles in the sense that and then the correction terms in (2.2) vanish and one has that
which justifies an alternative name for RVaR, namely Interquantile Expectation.
Definition 2.3 (Expected Shortfall).
Let be a probability distribution function on . For any we define the Expected Shortfall of at level via
Let and . Then one obtains the identity
If has a finite left tail () then one could use the right hand side of (2.1) as a definition of . However, in line with our discussion in the introduction, always exists and is finite for even if the right hand side of (2.1) is not defined.
Interestingly, Theorem 2 in Embrechts et al. (2018b) establishes that can be written as inf-convolution of and at appropriate levels. Note that this would rather amount to a sup-convolution in our context due to different sign conventions.
For , corresponds to the -trimmed mean and has a close connection to the -Winsorized mean (Huber and Ronchetti, 2009, pp. 57–59) via
It is easy to verify that for any distribution function and one obtains the inequality
2.2 Elicitability and scoring functions
We essentially follow the notation used in Fissler and Ziegel (2016), which follows the decision-theoretic framework used in Gneiting (2011). Let be a generic class of probability distribution functions on . An action domain is a subset for . Whenever we consider a functional , we tacitly assume that is well-defined for all and is an element of . corresponds to the image . For any subset we denote with the largest open subset of . Moreover, denotes the convex hull of the set .
We say that a function is -integrable if it is measurable and for all . Similarly, a function is called -integrable if is -integrable for all . If is -integrable, we define the map
If is sufficiently smooth in its first argument, we denote the th partial derivative of with .
Definition 2.4 (Consistency and elicitability).
A scoring function is an -integrable map . It is called -consistent for a functional if for all and for all . It is strictly -consistent for if it is consistent and if implies that for all and for all . Wherever it is convenient, we assume that is locally bounded for all . A functional is elicitable if it possesses a strictly -consistent scoring function.
Definition 2.5 (Equivalence).
Two scoring function are called equivalent if there is some -integrable function and some such that for all .
It is immediate that the above relation is indeed an equivalence relation. Moreover, if and are equivalent, then is (strictly) -consistent for some functional if and only if is (strictly) -consistent for .
Closely related to the concept of elicitability is the notion of identifiability.
Definition 2.6 (Identification functions and identifiability).
An -integrable map , where , is an identification function for a functional if for all . It is a strict -identification function for if it is an identification function and if implies that for all and for all . Wherever it is convenient, we assume that is locally bounded jointly in both arguments. A functional is identifiable if it possesses a strict -identification function.
Note that in contrast to Gneiting (2011) we assume that the functional maps to rather than to the power set of .
3 Elicitability and identifiability results
3.1 RVaR is not elicitable
It is well known that the mean-functional is elicitable with respect to the class of probability distributions with finite mean. Value-at-Risk at level is elicitable relative to the class of probability distributions with unique -quantiles.111That is, if for all . Gneiting (2011) showed that expected shortfall (ES) fails to have convex level sets which implies that ES is not elicitable. On the other hand, Fissler and Ziegel (2016) provide a positive result showing that the pair is elicitable. The following proposition treats the case of for .
Let . If contains all measures with finite support, the following assertions hold.
does not have convex level sets.
does not have convex level sets.
does not have convex level sets.
We start with (i). Let where and where . Define the two measures and . Then , and . On the other hand, one obtains , which shows (i). Assertion (ii) follows with a similar argument, whereas (iii) is a direct corollary of (i) or (ii). ∎
We would like to remark that Wang and Wei (2018, Example 7) provide an alternative proof that does not have convex level sets for . Moreover, their Theorem 2 gives an alternative way of establishing assertions (i) and (ii) in Proposition 3.1.
Let . If contains all measures with finite support, the following assertions hold.
is not elicitable.
is not elicitable.
is not elicitable.
With similar arguments one can show the assertions of Proposition 3.1 (and Corollary 3.2) if contains all measures with compact support that are continuous with respect to the Lebesgue measure. With a continuity argument, one can extend this result to the class
containing mixtures of normal distributions.
3.2 RVaR is jointly elicitable with the corresponding quantiles
For any , we define , . Note that is -consistent for if for all and all . Moreover, it is strictly -consistent for if all distributions in have unique -quantiles.
Now let and consider the function defined as
Using the notation , the important observation is that for any distribution
This observation implies an identifiability result for the triplet whose proof is simple and omitted.
Let . If is a class of probability distributions such that and for all , then the function at (3.1) is an -identification function for the triplet . If moreover the - and -quantiles are unique for all elements of , then is a strict -identification function for .
Invoking the inequality at (2.3) the maximal sensible action domain for the triplet is .
Let be a class of distributions on , , and where . Let be a scoring function of the form
where is -integrable, , , such that the functions are -integrable for all , and is convex with subgradient . If for all the functions
are increasing, then is -consistent for . If moreover is strictly convex, the functions at (3.4) and (3.5) are strictly increasing, and any distribution in has unique - and -quantiles, then is strictly -consistent for .
To simplify the notation in the proof, we shall occasionally evaluate the score on rather than on . Let , and . Then, since is increasing, is -consistent for and it is strictly -consistent if is strictly increasing and if the distributions in have unique -quantiles. Similar comments apply to the map . Hence,
with a strict inequality under the conditions for strict consistency and if . Finally,
since is convex. If is strictly convex and if , then the inequality in (3.6) is strict. ∎
If , then Theorem 3.4 holds also for the action domain .
Even though the maximal sensible action domain for is , the proof of Theorem 3.4 shows that the scoring function given at (3.3) is even strictly -consistent on the Cartesian product where , , is the projection of to the th component. This enables the evaluation of forecasts ignoring the crucial inequality at (2.3).
If the scoring is of the form at (3.3) such that is strictly convex and the functions and are strictly increasing, but some distributions fail to have unique - or -quantiles, then also fails to be strictly -consistent. However, it is still strictly -consistent in the -component. That is, for
where is the full set-valued -quantile.
Making use of the relation at (2.2) and the revelation principle (Osband, 1985; Gneiting, 2011; Fissler, 2017), Theorem 3.4 establishes that the triplet is elicitable where is the -Winsorized mean. Moreover, it gives a rich class of strictly consistent scoring function for this triplet. The following proposition is useful to construct examples; see Section 6.
The subgradient of is necessarily bounded.
is equivalent to a scoring function of the form with a (strictly) convex function such that is bounded with , and strictly increasing functions , such that their one-sided derivatives are bounded below by one.
The proof is similar to the one of Corollary 5.5 in Fissler and Ziegel (2016). Take some with . Then, for any one obtains . One obtains that . With similar arguments one can show that .
For any , if we replace with , with , and with in the formula (3.3) for , then does not change and is (strictly) convex if and only if is (strictly) convex. Furthermore, conditions (3.4) and (3.5) hold for , , if and only if they hold for , and . By part (i) of the proposition is bounded. Therefore, we can assume without loss of generality that . By scaling of the scoring function , we obtain an equivalent scoring function where we can assume that .
Let such that . Condition (3.4) implies that
In particular, is strictly increasing, and therefore, one-sided derivatives exist everywhere and by the above inequality they are bounded below by one. The argument for works analogously.
Using Osband’s principle (Fissler and Ziegel, 2016, Theorem 3.2), one can also establish a necessary condition for strict consistency of scoring functions for the triplet . For any and , let .
Let be a class of continuously differentiable distributions on and . Assume that all distributions in have unique - and -quantiles. Let . Then, defined at (3.1) is a strict -identification function for which satisfies Assumption (V3). If Assumptions (V1), and (F1) hold and satisfies Assumption (V4), then any strictly -consistent scoring function for that satisfies assumptions (VS1) and (S2) is necessarily of the form given at (3.3) almost everywhere, where the functions , , , at (3.4) and (3.5) are strictly increasing and is strictly convex.
Let with derivative and let . Then one obtains
The partial derivatives of are given by
An adaptation of Osband’s Principle (Fissler and Ziegel, 2016, Theorem 3.2) yields the existence of continuously differentiable functions , , such that for
Since we assume that is twice continuously differentiable for any , the second order partial derivatives need to commute. Let . Then is equivalent to
This needs to hold for all . The variation in the densities implied by Assumption (V4) in combination with the surjectivity of yield that on . Similarly, evaluating and at yields
Using again Assumption (V4) as well as the surjectivity of , this implies that
So we are left with characterising for . Note that Assumption (V1) implies that for any there are two distributions such that and are linearly independent. Then, the requirement that
for all and for all implies that .
Starting with , implies that
Again, Assumption (V1) implies that there are such that and are linearly independent. Hence, we obtain that and . With the same argumentation and starting from one can show that and . That means there exist functions