1 Introduction
From the cradle to the grave, human life is full of decisions. Due to the inherent nature of time, decisions have to be made today, but at the same time, they are supposed to account for unknown and uncertain future events. However, since these future events cannot be known today, the best thing to do is to base the decisions on predictions
for these unknown and uncertain events. The call for and the usage of predictions for future events is literally ubiquitous and even dates back to ancient times. In those days, dreams, divination, and revelation were considered as respected sources for forecasts, with the most prominent example being the Delphic Oracle which was not only consulted for decisions of private life, but also for strategic political decisions concerning peace and war. With the development of natural sciences, mathematics, and in particular statistics and probability theory, the ancient metaphysical art of making qualitative forecasts turned into a sophisticated discipline of science adopting a quantitative perspective. Subfields such as meteorology, mathematical finance, or even futurology evolved.
Acknowledging that forecasts are inherently uncertain, two main questions arise:

How good is a forecast in absolute terms?

How good is a forecast in relative terms?
While question (i) deals with forecast validation, this paper focuses on some aspects of question (ii) which is concerned with forecast selection, forecast comparison, or forecast ranking. Specifically, we present results on ordersensitivity and equivariance of consistent scoring functions for elicitable functionals. These results may provide guidance for choosing a specific scoring function for forecast comparison within the large class of all consistent scoring functions for an elicitable functional of interest.
We adopt the general decisiontheoretic framework following Gneiting (2011); cf. Savage (1971); Osband (1985); Lambert et al. (2008). For some number , one has

observed ex post realizations of a time series , taking values in an observation domain with a algebra ;

ex ante forecasts , of competing experts / forecasters taking values in an action domain for some ;

a
scoring (or loss) function
. The scoring function is assumed to be negatively oriented, that is, if a forecaster reports the quantity and materializes, she is assigned the penalty .
The observations can be realvalued (GDP growth for one year, maximal temperature of one day), vectorvalued (windspeed, weight and height of persons), functionalvalued (path of the exchange rate Euro–Swiss franc over one day), or also setvalued (area of rain on a given day, area affected by a flood). In this article, we focus on point forecasts that may be vectorvalued, which is why we assume for some and we equip the Borel set with the Borel algebra. One is typically interested in a certain statistical property of the underlying (conditional) distribution of . We assume that this property can be expressed in terms of a functional
such as the mean, a certain quantile, or a risk measure. Examples of vectorvalued functionals are the covariance matrix of a multivariate observation or a vector of quantiles at different levels. Common examples for scoring functions are the absolute loss
, the squared loss (for ), or the absolute percentage loss (for ).Forecast comparison is done in terms of realized scores
(1.1) 
That is, a forecaster is deemed to be the better the lower her realized score is. However, there is the following caveat: The forecast ranking in terms of realized scores not only depends on the forecasts and the realizations (as it should definitely be the case), but also on the choice of the scoring function. In order to avoid impure possibilities of manipulating the forecast ranking ex post with the data at hand, it is necessary to specify a certain scoring function before the inspection of the data. A fortiori, for the sake of transparency and in order to encourage truthful forecasts, one ought to disclose the choice of the scoring function to the competing forecasters ex ante. But still, the optimal choice of the scoring function remains an open problem. One can think of two situations:

A decisionmaker might be aware of his actual economic costs of utilizing misspecified forecasts. In this case, the scoring function should reflect these economic costs.

The actual economic costs might be unclear and the scoring function might be just a tool for forecast ranking. However, the directive is given in terms of the functional one is interested in.
For situation (i) described above, one should use the readily economically interpretable cost or scoring function. Therefore, the only concern is situation (ii). In this paper, we consider predictions in a oneperiod setting, thus, dropping the index . This is justified by our objectives to understand the properties of scoring functions which do not change over time and is common in the literature (Murphy and Daan, 1985; Diebold and Mariano, 1995; Lambert et al., 2008; Gneiting, 2011).
Assuming the forecasters are homines oeconomici and adopting the rationale of expected utility maximization, given a concrete scoring function , the most sensible action consists in minimizing the expected score with respect to the forecast , where follows the distribution , thus issuing the Bayes act . Hence, a scoring function should be incentive compatible in that it encourages truthful and honest forecasts. In line with Murphy and Daan (1985) and Gneiting (2011), we make the following definition.
Definition 1.1 (Consistency and elicitability).
A scoring function is a map that is integrable.^{1}^{1}1We say that a function is integrable if it is integrable for each . A function is integrable if is integrable for each . It is consistent for a functional if
(1.2) 
for all and for all , where . It is strictly consistent for if it is consistent for and if equality in (1.2) implies . A functional is called elicitable, if there exists a strictly consistent scoring function for .
Clearly, elicitability and consistent scoring functions are naturally linked also to estimation problems, in particular, Mestimation (Huber, 1964; Huber and Ronchetti, 2009)
and regression with prominent examples being ordinary least squares, quantile, or expectile regression
(Koenker, 2005; Newey and Powell, 1987).The necessity of utilizing strictly consistent scoring functions for meaningful forecast comparison is impressively demonstrated in terms of a simulation study in Gneiting (2011). However, for a given functional , there is typically a whole class of strictly consistent scoring functions for it, such as all Bregman functions in case of the mean (Savage, 1971); further examples are given below. Patton (2017) shows that the forecast ranking based on (1.1) may depend on the choice of the strictly consistent scoring function for in finite samples, and even at the population level if we compare two imperfect forecasts with each other.
Therefore, we naturally have a threefold elicitation problem:

Is elicitable?

What is the class of strictly consistent scoring functions for ?

What are distinguished strictly consistent scoring functions for ?
Even though the denomination and the synopsis of the described problems under the term ‘elicitation problem’ are novel, there is a rich strand of literature in mathematical statistics and economics concerned with the threefold elicitation problem. Foremost, one should mention the pioneering work of Osband (1985), establishing a necessary condition for elicitability in terms of convex level sets of the functional, and a necessary representation of strictly consistent scoring functions, known as Osband’s principle (Gneiting, 2011). Whereas the necessity of convex level sets holds in broad generality, Lambert (2013) could specify sufficient conditions for elicitability for functionals taking values in a finite set, and Steinwart et al. (2014)
showed sufficiency of convex level sets for realvalued functionals satisfying certain regularity conditions. Moments, ratios of moments, quantiles, and expectiles are in general elicitable, whereas other important functionals such as variance, Expected Shortfall or the mode functional are not
(Savage, 1971; Osband, 1985; Weber, 2006; Gneiting, 2011; Heinrich, 2014).Concerning subproblem (ii) of the elicitation problem, Savage (1971), Reichelstein and Osband (1984), Saerens (2000), and Banerjee et al. (2005)
gave characterizations for strictly consistent scoring functions for the mean functional of a onedimensional random variable in terms of Bregman functions. Strictly consistent scoring functions for quantiles have been characterized by
Thomson (1979) and Saerens (2000). Gneiting (2011) provides a characterization of the class of strictly consistent scoring functions for expectiles. The case of vectorvalued functionals apart from means of random vectors has been treated substantially less than the onedimensional case (Osband, 1985; Banerjee et al., 2005; Lambert et al., 2008; Frongillo and Kash, 2015a, b; Fissler and Ziegel, 2016a).The strict consistency of only justifies a comparison of two competing forecasts if one of them reports the true functional value. If both of them are misspecified, it is per se not possible to draw a conclusion which forecast is ‘closer’ to the true functional value by comparing the realized scores. To this end, some notions of ordersensitivity are desirable. According to Lambert (2013) we say that a scoring function is ordersensitive for a onedimensional functional if for any and any such that either or , then . This means, if a forecast lies between the true functional value and some other forecast, then issuing the forecast inbetween should yield a smaller expected score than issuing the forecast further away. In particular, ordersensitivity implies consistency. Vice versa, under weak regularity conditions on the functional, strict consistency also implies ordersensitivity if the functional is realvalued; see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4).
This article is dedicated to a thorough investigation of ordersensitive scoring functions for vectorvalued functionals, thus contributing to a discussion of subproblem (iii) of the elicitation problem. Furthermore, we investigate to which extent invariance or equivariance properties of elicitable functionals are reflected in their respective consistent scoring functions.
Lambert et al. (2008) introduced a notion of componentwise ordersensitivity for the case of . Friedman (1983) and Nau (1985) considered similar questions in the setting of probabilistic forecasts, coining the term of effectiveness of scoring rules which can be described as ordersensitivity in terms of a metric. In Section 3, we consider three notions of ordersensitivity in the higherdimensional setting: metrical ordersensitivity, componentwise ordersensitivity, and ordersensitivity on line segments. We discuss their connections and give conditions when such scoring functions exist and of what form they are for the most relevant functionals, such as vectors of quantiles, expectiles, ratios of expectations, the pair of mean and variance, and the pair consisting of Value at Risk and Expected Shortfall, two important risk measures in banking and insurance.
Complementing our results on ordersensitivity, in Section 2, we consider the analytic properties of the expected score , , for some scoring function and some distribution . The (strict) consistency of for some functional is equivalent the expected score having a (unique) global minimum at . Ordersensitivity ensures monotonicity properties of the expected score. As a technical result, we show that under weak regularity assumptions on , the expected score of a strictly consistent scoring function has a unique local minimum – which, of course, coincides with the global minimum at . Accompanied with a result on selfcalibration, a continuity property of the inverse of the expected score, which ensures that the minimum of the expected score is wellseparated in the sense of van der Vaart (1998), these two findings may be of interest on their own right in the context of Mestimation.
In Section 4, we consider functionals that have an invariance or equivariance property such as translation invariance or homogeneity. It is a natural question whether a functional that is, for example, translation equivariant has a consistent scoring function that respects this property in the sense that if we evaluate forecast performance of translated predictions and observations, the ranking of predictive performance remains the same as that of the original data. In parametric estimation problems, such a scoring functions may allow to translate the data without affecting the estimated parameter values. For onedimensional functionals, invariance of the scoring function often determines it uniquely up to equivalence while this is not necessarily the case for higherdimensional functionals (Proposition 4.7 and Corollary 4.12).
2 Analytic properties of expected scores
2.1 Monotonicity
Definition 2.1 (Mixturecontinuity).
Let be convex. A functional is called mixturecontinuous if for all the map
is continuous.
It is appealing that one does not have to specify a topology on to define mixturecontinuity because it suffices to work with the induced Euclidean topology on and on .
It turns out that mixturecontinuity of a functional is strong enough to imply ordersensitivity in the case of onedimensional functionals (see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4)), and desirable monotonicity properties of the expected scores also in higher dimensions (Propositions 2.5 and 2.7). At the same time, numerous functionals of applied relevance are mixturecontinuous, and we start by giving examples and a sufficient condition (Proposition 2.2).
It is straight forward to see that the ratio of expectations is mixturecontinuous. Moreover, by the implicit function theorem, one can verify the mixturecontinuity of quantiles and expectiles directly under appropriate regularity conditions (e.g., in the case of quantiles, all distributions in should be with nonvanishing derivatives). Generalizing Bellini and Bignozzi (2015, Proposition 3.4c), we give a sufficient criterion for mixturecontinuity in the next proposition. Our version is not restricted to distributions with compact support (however, the image of the functional must be bounded), and we formulate the result for dimensional functionals.
Proposition 2.2.
Let be an elicitable functional with a strictly consistent scoring function such that is continuous for all . Then is mixturecontinuous on any such that is convex and the image is bounded.
Proof.
Let be convex such that for some . Let . Define via
Then is jointly continuous, and due to the strict consistency
By virtue of the Berge Maximum Theorem (Aliprantis and Border, 2006, Theorem 17.31 and Lemma 17.6), the function is continuous. ∎
Similarly to the original proof of Bellini and Bignozzi (2015), a sufficient criterion for the continuity of for any is that for all , the score is quasiconvex and continuous in .^{2}^{2}2We remark that for , if a scoring function is strictly consistent for some functional where consists of all point measures on , then the quasiconvexity of for all is equivalent to the ordersensitivity of for .
Recall that, under appropriate regularity conditions on , the asymmetric piecewise linear loss and the asymmetric piecewise quadratic loss are strictly consistent scoring functions for the quantile and the expectile, respectively, and both, as well as , are continuous in their first argument and convex. Hence, Proposition 2.2 yields that both quantiles and expectiles are mixturecontinuous.
Steinwart et al. (2014) used Osband’s principle (Osband, 1985) and the assumption of continuity of with respect to the total variation distance to show ordersensitivity. Bellini and Bignozzi (2015) showed that the weak continuity of a functional implies its mixturecontinuity. Consequently, one can also derive the ordersensitivity in the framework of Steinwart et al. (2014) directly using only mixturecontinuity.
Lambert (2013) showed that it is a harder requirement to have ordersensitivity if is discrete. Then both approaches, invoking Osband’s principle or using mixturecontinuity, do not work because the interior of the image of is empty. Moreover, mixturecontinuity implies that the functional is constant (such that only trivial cases can be considered). Furthermore, it is proven in Lambert (2013) that for a functional with a discrete image, all strictly consistent scoring functions are ordersensitive if and only if there is one ordersensitive scoring function for .In particular, there are functionals admitting strictly consistent scoring functions that are not ordersensitive, one such example being the mode functional.^{3}^{3}3Note that due to Proposition 1 in Heinrich (2014), the mode functional is elicitable relative to the class of probability measure containing unimodal discrete measures. Moreover, interpreting the mode functional as a setvalued functional, it is elicitable in the sense of Gneiting (2011, Definition 2). A strictly consistent scoring function is given by . The main result of Heinrich (2014) is that the mode functional is not elicitable relative to the class of unimodal probability measures with Lebesgue densities.
Let us turn attention to vectorvalued functionals now. To understand the monotonicity properties of the expected score of a mixturecontinuous elicitable functional , it is useful to consider paths , for . If is elicitable, a classical result asserts that necessarily has convex level sets (Gneiting, 2011, Theorem 6). This implies that the level sets of can only be closed intervals including the case of singletons and the empty set. This rules out loops and some other possible pathologies of . Furthermore, under the assumption that is identifiable as defined below, one can even show that the path is either injective or constant.
Definition 2.3 (Identifiability).
Let . An integrable function is said to be an identification function for a functional if
for all . Furthermore, is a strict identification function for if implies for all and for all . A functional is said to be identifiable, if there exists a strict identification function for .
In line with Gneiting (2011, Section 2.4), one can often obtain an identification function as the gradient of a sufficiently smooth scoring function. However, the converse intuition is not so clear – at least in the higher dimensional setting : Not all strict identification functions can be integrated to a strictly consistent scoring function. They have to satisfy the usual integrability conditions (Königsberger, 2004, p. 185); see also Fissler and Ziegel (2016a, Corollary 3.3) and the discussion thereafter.
Lemma 2.4.
Let be convex and be identifiable with a strict identification function . Then for any , the path , , is either constant or injective.
Proof.
Let such that . For any , one has . Since is a strict identification function for , for all .
Now let and let . Since is a strict identification function, (and symmetrically .) Assume that . Define , . There are such that and . Hence,
and similarly Consequently, , which is a contradiction to the assumption that . This implies that . ∎
Proposition 2.5.
Let be convex and be mixturecontinuous and surjective. Let be strictly consistent for . Then for each , and each , there is a continuous path such that , , and the function is decreasing. Additionally, for such that it holds that .
Proof.
Let , and . Then there is some with . Define . Clearly, and . Due to the mixturecontinuity of , the path is also continuous. The rest follows along the lines of the proof of Nau (1985, Proposition 3). Let . If , there is nothing to show. So assume that . Define , and analogously. Then, for , it holds that . The strict consistency of implies that
which is equivalent to
By strict consistency of , the lefthand side is nonnegative yielding the assertion. ∎
Remark 2.6.
Under certain (weak) regularity conditions, the expected scores of a strictly consistent scoring function has no other local minimum apart from the global one at .
Proposition 2.7.
Let be convex and be mixturecontinuous and surjective. If is strictly consistent for , then for all the expected score has only one local minimum which is at .
Proof.
Let with . Due to the strict consistency of , the expected score has a local minimum at . Assume there is another local minimum at some . Then there is a distribution with . Consider the path . Due to Proposition 2.5 the function is decreasing and strictly decreasing when we move on the image of the path from to . Hence cannot have a local minimum at . ∎
2.2 Selfcalibration
With Proposition 2.5 it is possible to prove that, under mild regularity conditions, strictly consistent scoring functions are selfcalibrated which turns out to be useful in the context of Mestimation.
Definition 2.8 (Selfcalibration).
A scoring function is called selfcalibrated for a functional with respect to a norm^{4}^{4}4It is straight forward to use a metric instead of a norm on but in this article we only consider , so we did not see any benefit in considering this more general case. See also the discussion before Definition 3.4. on if for all and for all there is a such that for all and
The notion of selfcalibration was introduced by Steinwart (2007)
in the context of machine learning. In a preprint version of
Steinwart et al. (2014),^{5}^{5}5Available at http://users.cecs.anu.edu.au/~williams/papers/P196.pdf the authors translate this concept to the setting of scoring functions as follows (using our notation):“For selfcalibrated , every approximate minimizer of , approximates the desired property with precision not worse than . […] In some sense order sensitivity is a global and qualitative notion while selfcalibration is a local and quantitative notion.”
In line with this quotation, selfcalibration can be considered as the continuity of the inverse of the expected score at the global minimum – and as such, it is a local property of the inverse. This property ensures that convergence of the expected score to its global minimum implies convergence of the forecast to the true functional value. On the other hand, selfcalibration of a scoring function is equivalent to the fact that the argmin of the expected score is a wellseparated point of minimum in the sense of van der Vaart (1998, p. 45) – as such being a global property of the expected score itself. That means that for any
It is relatively straight forward to see that selfcalibration implies strict consistency: Let be selfcalibrated for , , and with . Then for there is a such that .
In the preprint version of Steinwart et al. (2014) it is shown for that ordersensitivity implies selfcalibration. The next Proposition shows that the kind of ordersensitivity given by Proposition 2.5 also implies selfcalibration for .
Proposition 2.9.
Let be convex, be closed, and be a surjective and mixturecontinuous functional. If is strictly consistent for and is continuous for all , then is selfcalibrated for .
Proof.
Let , and . Define
Due to the continuity of , the minimum is welldefined and, as a consequence of the strict consistency of for , is positive. Let . If , we have, by the definition of , that . Assume that . Then there is a distribution with . Due to Proposition 2.5 there is a continuous path such that , and such that is decreasing in . Moreover, if such that it holds that . Due to the continuity of there is some with . Then we obtain . ∎
We end this subsection about selfcalibration by demonstrating its applicability in the context of Mestimation.
Theorem 2.10.
Let be an selfcalibrated scoring function for a functional . Then, the following assertion holds for all . If is a sequence of random variables with distribution such that
then
Proof.
This is a direct consequence of van der Vaart (1998, Theorem 5.7). ∎
3 Ordersensitivity
3.1 Different notions of ordersensitivity
The idea of ordersensitivity is that a forecast lying between the true functional value and some other forecast is also assigned an expected score lying between the two other expected scores. If the action domain is one dimensional, there are only two cases to consider: both forecasts are on the lefthand side of the functional value or on the righthand side. However, if for , the notion of ‘lying between’ is ambiguous. Two obvious interpretations for the multidimensional case are the componentwise interpretation and the interpretation that one forecast is the convex combination of the true functional value and the other forecast.
Definition 3.1 (Componentwise ordersensitivity).
A scoring function is called componentwise ordersensitive for a functional , if for all , and for all we have that:
(3.1) 
Moreover, is called strictly componentwise ordersensitive for if is componentwise ordersensitive and if in (3.1) implies that .
Remark 3.2.
In economic terms, a strictly componentwise ordersensitive scoring function rewards Pareto improvements^{6}^{6}6The definition of the Pareto principle according to Scott and Marshall (2009): “A principle of welfare economics derived from the writings of Vilfredo Pareto, which states that a legitimate welfare improvement occurs when a particular change makes at least one person better off, without making any other person worse off. A market exchange which affects nobody adversely is considered to be a ‘Paretoimprovement’ since it leaves one or more persons better off. ‘Pareto optimality’ is said to exist when the distribution of economic welfare cannot be improved for one individual without reducing that of another.” in the sense that improving the prediction performance in one component without deteriorating the prediction ability in the other components results in a lower expected score.
Definition 3.3 (Ordersensitivity on line segments).
Let be the Euclidean norm on . A scoring function is ordersensitive on line segments for a functional , if for all , , and for all the map
is increasing. If the map is strictly increasing, we call strictly ordersensitive on line segments for .
These two notions of ordersensitivity do not allow for a comparison of any two misspecified forecasts, no matter where they are relative to the true functional value. An intuitive requirement could be ‘the closer to the true functional value the smaller the expected score’, thus calling for the notion of a metric. Since, for a fixed functional and some fixed distribution , we always have a fixed reference point and we have the induced vectorspace structure of on , we shall only work with norms , . Recall that for , for and . If the assertion does not depend on the choice of , we shall usually omit the in the notation. For other choices of , it would be also interesting to replace the norm by a metric in the following definition.
Definition 3.4 (Metrical ordersensitivity).
Let . A scoring function is metrically ordersensitive for a functional relative to the norm, if for all , and for all we have that
(3.2) 
If additionally the inequalities in (3.2) are strict, we say that is strictly metrically ordersensitive for relative to .
Similarly to (strict) consistency, all three notions of (strict) ordersensitivity are preserved when considering two scoring functions that are equivalent.^{7}^{7}7Two scoring functions are equivalent if there is a positive constant and an integrable function such that , for all .
The notion of componentwise ordersensitivity corresponds almost literally to the notion of accuracyrewarding scoring functions introduced by Lambert et al. (2008). Metrically ordersensitivity scoring functions have their counterparts in the field of probabilistic forecasting in effective scoring rules introduced by Friedman (1983) and further investigated by Nau (1985). Actually, the latter paper has also given the inspiration for the notion of ordersensitivity on line segments. It is obvious that any of the three notions of (strict) ordersensitivity implies (strict) consistency. The next lemma formally states this result and gives some logical implications concerning the different notions of ordersensitivity. The proof is standard and therefore omitted.
Lemma 3.5.
Let be a functional and a scoring function.

Let . If is (strictly) metrically ordersensitive for relative to the norm, then is (strictly) componentwise ordersensitive for .

If is (strictly) metrically ordersensitive for relative to the norm, then is componentwise ordersensitive for .

If is (strictly) metrically ordersensitive for relative to the norm, then is (strictly) consistent for .

If is (strictly) componentwise ordersensitive for , then is (strictly) ordersensitive on line segments for .

If is (strictly) ordersensitive on line segments for , then is (strictly) consistent for .
3.2 Componentwise ordersensitivity
Under restrictive regularity assumptions, Lambert et al. (2008, Theorem 5) claim that whenever a functional has a componentwise ordersensitive scoring function, the components of the functional must be elicitable. Moreover, assuming that the measures in have finite support, they assert that any componentwise ordersensitive scoring function is the sum of strictly consistent scoring functions for the components. Lemma 3.6 shows the first claim under less restrictive smoothness assumptions on the scoring function. For many common examples of functionals, the second claim can be shown relaxing the restrictive condition on ; see Proposition 3.7 and the discussion before.
Lemma 3.6.
Let be a dimensional functional with components where . If there is a strictly componentwise ordersensitive scoring function for , then the components , , are elicitable.
Proof.
Fix . Let and such that , for all and . Due to the strict componentwise ordersensitivity of this implies that . This in turn means that for any the map ,
(3.3) 
is a strictly consistent scoring function for . ∎
If , , are mixturecontinuous and elicitable with strictly consistent scoring functions , then they are ordersensitive according to Lambert (2013, Proposition 2) and Bellini and Bignozzi (2015, Proposition 3.4). Therefore, the sum is strictly componentwise ordersensitive for . More interestingly, one can establish the reverse of the last assertion. Any strictly componentwise ordersensitive scoring function must necessarily be additively separable. In Fissler and Ziegel (2016a, Section 4), we established a dichotomy for functionals with elicitable components: In most relevant cases, the functional (the corresponding strict identification function, respectively) satisfies Assumption (V4) therein (e.g., when the functional is a vector of different quantiles and / or different expectiles with the exception of the 1/2expectile), or it is a vector of ratios of expectations with the same denominator, or it is a combination of both situations. Under some regularity conditions, Fissler and Ziegel (2016a, Propositions 4.2 and 4.4) characterize the form of strictly consistent scoring functions for the first two situations, whereas Fissler and Ziegel (2016a, Remark 4.5) is concerned with the third situation. For this latter situation, any strictly consistent scoring function must be necessarily additive for the respective blocks of the functional. And for the first situation, Fissler and Ziegel (2016a, Proposition 4.2) yields the additive form of automatically. It remains to consider the case of Fissler and Ziegel (2016a, Proposition 4.4), that is, a vector of ratios of expectations with the same denominator.
Proposition 3.7.
Let be a ratio of expectations with the same denominator, that is, for some integrable functions , such that for all .^{8}^{8}8It is no loss of generality to assume that for all in Proposition 3.7. In order to ensure that is welldefined, necessarily for all . However, Assumption (V1) implies that is convex. So if there are such that and then there is a convex combination of and such that . Consequently, either for all or for all , and by possibly changing the sign of one can assume that the first case holds. Assume that is surjective, and that is simply connected. Moreover, consider the strict identification function , and some strictly consistent scoring function such that the Assumptions (V1), (S2), (F1), and (VS1) in Fissler and Ziegel (2016a) hold. If is strictly componentwise ordersensitive for , then is of the form
(3.4) 
for almost all , where , , are strictly consistent scoring functions for , , and .
Proof.
Due to the fact that for fixed , is a polynomial in , Assumption (V3) in Fissler and Ziegel (2016a) is automatically satisfied. Let be the matrixvalued function given in Osband’s principle; see Fissler and Ziegel (2016a, Theorem 3.2). By Fissler and Ziegel (2016a, Proposition 4.4(i)) we have that
(3.5) 
for all , , where the first identity holds for almost all and the second identity for all . Moreover, the matrix is positive definite for all . If we can show that for , we can use the first part of (3.5) and deduce that for all there are positive functions , where , such that
for all . Then, we can conclude like in the proof of Fissler and Ziegel (2016a, Proposition 4.2(ii)).^{9}^{9}9The arguments in Fissler and Ziegel (2016a, Proposition 4.2(ii)) use Fissler and Ziegel (2016a, Proposition 3.4). There is a flaw in the latter result which has been pointed out in Brehmer (2017). We present a corrected version of the result in Appendix A.
Fix with and such that . Due to the strict consistency of defined at (3.3) we have that
whenever and for all . This means the map is constantly 0. Hence, for all
whenever . Using the special form of and Fissler and Ziegel (2016a, Corollary 3.3), we have for that
and by assumption . Using the surjectivity of we obtain that for all , which ends the proof. ∎
The notion of componentwise ordersensitivity has an appealing interpretation in the sense that it rewards Pareto improvements of the predictions; see Remark 3.2. The results of Lemma 3.6 and Proposition 3.7 give a clear understanding of the concept including its limitations to the case of functionals only consisting of elicitable components.
Ehm et al. (2016) introduced Murphy diagrams for forecast comparison of quantiles and expectiles. Murphy diagrams have the advantage that forecasts are compared simultaneously with respect to all consistent scoring functions for the respective functional. For many multivariate functionals such as ratios of expectations, the methodology cannot be readily extended because there are no mixture representations available for the class of all consistent scoring functions. Proposition 3.7 shows that when considering only componentwise ordersensitive consistent scoring functions, the situations is different and mixture representations (and hence Murphy diagrams) are readily available for forecast comparison.
3.3 Metrical ordersensitivity
We start with an equivalent formulation of metrical ordersensitivity.
Lemma 3.8.
Let be convex and be mixturecontinuous and surjective. Let be (strictly) consistent for . Then is (strictly) metrically ordersensitive for relative to if and only if for all , and we have the implication
(3.6) 
Proof.
Let be metrically order sensitive for relative to . Let , , such that . Then we have both and .
Assume that (3.6) holds and is (strictly) consistent. Let with and . Suppose that . If , (3.6) implies that and there is nothing to show. If , we can apply Proposition 2.5. There is a continuous path such that and , and the function is decreasing. Due to continuity there is a such that . Invoking (3.6) it holds that . If is strictly consistent then the latter inequality is strict. ∎
For a realvalued functional there can be at most one strictly metrically ordersensitive scoring function, up to equivalence. To show this, we use Osband’s principle and impose the corresponding regularity conditions.
Proposition 3.9.
Let be a surjective, elicitable and identifiable functional with an oriented strict identification function . If is convex and are two strictly metrically ordersensitive scoring functions for such that the Assumptions (V1), (V2), (S1), (F1) and (VS1) from Fissler and Ziegel (2016a) (with respect to both scoring functions) hold, then and are equivalent almost everywhere.
Comments
There are no comments yet.