# Order-Sensitivity and Equivariance of Scoring Functions

The relative performance of competing point forecasts is usually measured in terms of loss or scoring functions. It is widely accepted that these scoring function should be strictly consistent in the sense that the expected score is minimized by the correctly specified forecast for a certain statistical functional such as the mean, median, or a certain risk measure. Thus, strict consistency opens the way to meaningful forecast comparison, but is also important in regression and M-estimation. Usually strictly consistent scoring functions for an elicitable functional are not unique. To give guidance on the choice of a scoring function, this paper introduces two additional quality criteria. Order-sensitivity opens the possibility to compare two deliberately misspecified forecasts given that the forecasts are ordered in a certain sense. On the other hand, equivariant scoring functions obey similar equivariance properties as the functional at hand - such as translation invariance or positive homogeneity. In our study, we consider scoring functions for popular functionals, putting special emphasis on vector-valued functionals, e.g. the pair (mean, variance) or (Value at Risk, Expected Shortfall).

## Authors

• 9 publications
• 13 publications
02/12/2019

### Elicitability of Range Value at Risk

The predictive performance of point forecasts for a statistical function...
03/27/2015

### Of Quantiles and Expectiles: Consistent Scoring Functions, Choquet Representations, and Forecast Rankings

In the practice of point prediction, it is desirable that forecasters re...
07/11/2020

### Scoring Interval Forecasts: Equal-Tailed, Shortest, and Modal Interval

We consider different types of predictive intervals and ask whether they...
03/22/2021

### Using scoring functions to evaluate point process forecasts

Point process models are widely used tools to issue forecasts or assess ...
05/10/2019

### Why scoring functions cannot assess tail properties

Motivated by the growing interest in sound forecast evaluation technique...
10/16/2019

### Forecast Evaluation of Set-Valued Functionals

A functional is elicitable (identifiable) if it is the unique minimiser ...
07/02/2019

### Elicitability and Identifiability of Systemic Risk Measures and other Set-Valued Functionals

This paper is concerned with a two-fold objective. Firstly, we establish...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

From the cradle to the grave, human life is full of decisions. Due to the inherent nature of time, decisions have to be made today, but at the same time, they are supposed to account for unknown and uncertain future events. However, since these future events cannot be known today, the best thing to do is to base the decisions on predictions

for these unknown and uncertain events. The call for and the usage of predictions for future events is literally ubiquitous and even dates back to ancient times. In those days, dreams, divination, and revelation were considered as respected sources for forecasts, with the most prominent example being the Delphic Oracle which was not only consulted for decisions of private life, but also for strategic political decisions concerning peace and war. With the development of natural sciences, mathematics, and in particular statistics and probability theory, the ancient metaphysical art of making qualitative forecasts turned into a sophisticated discipline of science adopting a quantitative perspective. Subfields such as meteorology, mathematical finance, or even futurology evolved.

Acknowledging that forecasts are inherently uncertain, two main questions arise:

1. How good is a forecast in absolute terms?

2. How good is a forecast in relative terms?

While question (i) deals with forecast validation, this paper focuses on some aspects of question (ii) which is concerned with forecast selection, forecast comparison, or forecast ranking. Specifically, we present results on order-sensitivity and equivariance of consistent scoring functions for elicitable functionals. These results may provide guidance for choosing a specific scoring function for forecast comparison within the large class of all consistent scoring functions for an elicitable functional of interest.

We adopt the general decision-theoretic framework following Gneiting (2011); cf. Savage (1971); Osband (1985); Lambert et al. (2008). For some number , one has

1. observed ex post realizations of a time series , taking values in an observation domain with a -algebra ;

2. a family

, containing the (conditional) distributions of ;

3. ex ante forecasts , of competing experts / forecasters taking values in an action domain for some ;

4. a

scoring (or loss) function

. The scoring function is assumed to be negatively oriented, that is, if a forecaster reports the quantity and materializes, she is assigned the penalty .

The observations can be real-valued (GDP growth for one year, maximal temperature of one day), vector-valued (wind-speed, weight and height of persons), functional-valued (path of the exchange rate Euro–Swiss franc over one day), or also set-valued (area of rain on a given day, area affected by a flood). In this article, we focus on point forecasts that may be vector-valued, which is why we assume for some and we equip the Borel set with the Borel -algebra. One is typically interested in a certain statistical property of the underlying (conditional) distribution of . We assume that this property can be expressed in terms of a functional

such as the mean, a certain quantile, or a risk measure. Examples of vector-valued functionals are the covariance matrix of a multivariate observation or a vector of quantiles at different levels. Common examples for scoring functions are the absolute loss

, the squared loss (for ), or the absolute percentage loss (for ).

Forecast comparison is done in terms of realized scores

 ¯S(i)n=1nn∑t=1S(x(i)t,yt),i∈{1,…,m}. (1.1)

That is, a forecaster is deemed to be the better the lower her realized score is. However, there is the following caveat: The forecast ranking in terms of realized scores not only depends on the forecasts and the realizations (as it should definitely be the case), but also on the choice of the scoring function. In order to avoid impure possibilities of manipulating the forecast ranking ex post with the data at hand, it is necessary to specify a certain scoring function before the inspection of the data. A fortiori, for the sake of transparency and in order to encourage truthful forecasts, one ought to disclose the choice of the scoring function to the competing forecasters ex ante. But still, the optimal choice of the scoring function remains an open problem. One can think of two situations:

1. A decision-maker might be aware of his actual economic costs of utilizing misspecified forecasts. In this case, the scoring function should reflect these economic costs.

2. The actual economic costs might be unclear and the scoring function might be just a tool for forecast ranking. However, the directive is given in terms of the functional one is interested in.

For situation (i) described above, one should use the readily economically interpretable cost or scoring function. Therefore, the only concern is situation (ii). In this paper, we consider predictions in a one-period setting, thus, dropping the index . This is justified by our objectives to understand the properties of scoring functions which do not change over time and is common in the literature (Murphy and Daan, 1985; Diebold and Mariano, 1995; Lambert et al., 2008; Gneiting, 2011).

Assuming the forecasters are homines oeconomici and adopting the rationale of expected utility maximization, given a concrete scoring function , the most sensible action consists in minimizing the expected score with respect to the forecast , where follows the distribution , thus issuing the Bayes act . Hence, a scoring function should be incentive compatible in that it encourages truthful and honest forecasts. In line with Murphy and Daan (1985) and Gneiting (2011), we make the following definition.

###### Definition 1.1 (Consistency and elicitability).

A scoring function is a map that is -integrable.111We say that a function is -integrable if it is -integrable for each . A function is -integrable if is -integrable for each . It is -consistent for a functional if

 ¯S(T(F),F)≤¯S(x,F) (1.2)

for all and for all , where . It is strictly -consistent for if it is -consistent for and if equality in (1.2) implies . A functional is called elicitable, if there exists a strictly -consistent scoring function for .

Clearly, elicitability and consistent scoring functions are naturally linked also to estimation problems, in particular, M-estimation (Huber, 1964; Huber and Ronchetti, 2009)

and regression with prominent examples being ordinary least squares, quantile, or expectile regression

(Koenker, 2005; Newey and Powell, 1987).

The necessity of utilizing strictly consistent scoring functions for meaningful forecast comparison is impressively demonstrated in terms of a simulation study in Gneiting (2011). However, for a given functional , there is typically a whole class of strictly consistent scoring functions for it, such as all Bregman functions in case of the mean (Savage, 1971); further examples are given below. Patton (2017) shows that the forecast ranking based on (1.1) may depend on the choice of the strictly consistent scoring function for in finite samples, and even at the population level if we compare two imperfect forecasts with each other.

Therefore, we naturally have a threefold elicitation problem:

1. Is elicitable?

2. What is the class of strictly -consistent scoring functions for ?

3. What are distinguished strictly -consistent scoring functions for ?

Even though the denomination and the synopsis of the described problems under the term ‘elicitation problem’ are novel, there is a rich strand of literature in mathematical statistics and economics concerned with the threefold elicitation problem. Foremost, one should mention the pioneering work of Osband (1985), establishing a necessary condition for elicitability in terms of convex level sets of the functional, and a necessary representation of strictly consistent scoring functions, known as Osband’s principle (Gneiting, 2011). Whereas the necessity of convex level sets holds in broad generality, Lambert (2013) could specify sufficient conditions for elicitability for functionals taking values in a finite set, and Steinwart et al. (2014)

showed sufficiency of convex level sets for real-valued functionals satisfying certain regularity conditions. Moments, ratios of moments, quantiles, and expectiles are in general elicitable, whereas other important functionals such as variance, Expected Shortfall or the mode functional are not

(Savage, 1971; Osband, 1985; Weber, 2006; Gneiting, 2011; Heinrich, 2014).

Concerning subproblem (ii) of the elicitation problem, Savage (1971), Reichelstein and Osband (1984), Saerens (2000), and Banerjee et al. (2005)

gave characterizations for strictly consistent scoring functions for the mean functional of a one-dimensional random variable in terms of Bregman functions. Strictly consistent scoring functions for quantiles have been characterized by

Thomson (1979) and Saerens (2000). Gneiting (2011) provides a characterization of the class of strictly consistent scoring functions for expectiles. The case of vector-valued functionals apart from means of random vectors has been treated substantially less than the one-dimensional case (Osband, 1985; Banerjee et al., 2005; Lambert et al., 2008; Frongillo and Kash, 2015a, b; Fissler and Ziegel, 2016a).

The strict consistency of only justifies a comparison of two competing forecasts if one of them reports the true functional value. If both of them are misspecified, it is per se not possible to draw a conclusion which forecast is ‘closer’ to the true functional value by comparing the realized scores. To this end, some notions of order-sensitivity are desirable. According to Lambert (2013) we say that a scoring function is -order-sensitive for a one-dimensional functional if for any and any such that either or , then . This means, if a forecast lies between the true functional value and some other forecast, then issuing the forecast in-between should yield a smaller expected score than issuing the forecast further away. In particular, order-sensitivity implies consistency. Vice versa, under weak regularity conditions on the functional, strict consistency also implies order-sensitivity if the functional is real-valued; see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4).

This article is dedicated to a thorough investigation of order-sensitive scoring functions for vector-valued functionals, thus contributing to a discussion of subproblem (iii) of the elicitation problem. Furthermore, we investigate to which extent invariance or equivariance properties of elicitable functionals are reflected in their respective consistent scoring functions.

Lambert et al. (2008) introduced a notion of componentwise order-sensitivity for the case of . Friedman (1983) and Nau (1985) considered similar questions in the setting of probabilistic forecasts, coining the term of effectiveness of scoring rules which can be described as order-sensitivity in terms of a metric. In Section 3, we consider three notions of order-sensitivity in the higher-dimensional setting: metrical order-sensitivity, componentwise order-sensitivity, and order-sensitivity on line segments. We discuss their connections and give conditions when such scoring functions exist and of what form they are for the most relevant functionals, such as vectors of quantiles, expectiles, ratios of expectations, the pair of mean and variance, and the pair consisting of Value at Risk and Expected Shortfall, two important risk measures in banking and insurance.

Complementing our results on order-sensitivity, in Section 2, we consider the analytic properties of the expected score , , for some scoring function and some distribution . The (strict) consistency of for some functional is equivalent the expected score having a (unique) global minimum at . Order-sensitivity ensures monotonicity properties of the expected score. As a technical result, we show that under weak regularity assumptions on , the expected score of a strictly consistent scoring function has a unique local minimum – which, of course, coincides with the global minimum at . Accompanied with a result on self-calibration, a continuity property of the inverse of the expected score, which ensures that the minimum of the expected score is well-separated in the sense of van der Vaart (1998), these two findings may be of interest on their own right in the context of M-estimation.

In Section 4, we consider functionals that have an invariance or equivariance property such as translation invariance or homogeneity. It is a natural question whether a functional that is, for example, translation equivariant has a consistent scoring function that respects this property in the sense that if we evaluate forecast performance of translated predictions and observations, the ranking of predictive performance remains the same as that of the original data. In parametric estimation problems, such a scoring functions may allow to translate the data without affecting the estimated parameter values. For one-dimensional functionals, invariance of the scoring function often determines it uniquely up to equivalence while this is not necessarily the case for higher-dimensional functionals (Proposition 4.7 and Corollary 4.12).

## 2 Analytic properties of expected scores

### 2.1 Monotonicity

###### Definition 2.1 (Mixture-continuity).

Let be convex. A functional is called mixture-continuous if for all the map

 [0,1]→R,λ↦T((1−λ)F+λG)

is continuous.

It is appealing that one does not have to specify a topology on to define mixture-continuity because it suffices to work with the induced Euclidean topology on and on .

It turns out that mixture-continuity of a functional is strong enough to imply order-sensitivity in the case of one-dimensional functionals (see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4)), and desirable monotonicity properties of the expected scores also in higher dimensions (Propositions 2.5 and 2.7). At the same time, numerous functionals of applied relevance are mixture-continuous, and we start by giving examples and a sufficient condition (Proposition 2.2).

It is straight forward to see that the ratio of expectations is mixture-continuous. Moreover, by the implicit function theorem, one can verify the mixture-continuity of quantiles and expectiles directly under appropriate regularity conditions (e.g., in the case of quantiles, all distributions in should be with non-vanishing derivatives). Generalizing Bellini and Bignozzi (2015, Proposition 3.4c), we give a sufficient criterion for mixture-continuity in the next proposition. Our version is not restricted to distributions with compact support (however, the image of the functional must be bounded), and we formulate the result for -dimensional functionals.

###### Proposition 2.2.

Let be an elicitable functional with a strictly -consistent scoring function such that is continuous for all . Then is mixture-continuous on any such that is convex and the image is bounded.

###### Proof.

Let be convex such that for some . Let . Define via

 hF,G(x,λ)=¯S(x,(1−λ)F+λG)=(1−λ)¯S(x,F)+λ¯S(x,G).

Then is jointly continuous, and due to the strict consistency

 T((1−λ)F+λG)=argminx∈[−C,C]khF,G(x,λ).

By virtue of the Berge Maximum Theorem (Aliprantis and Border, 2006, Theorem 17.31 and Lemma 17.6), the function is continuous. ∎

Similarly to the original proof of Bellini and Bignozzi (2015), a sufficient criterion for the continuity of for any is that for all , the score is quasi-convex and continuous in .222We remark that for , if a scoring function is strictly -consistent for some functional where consists of all point measures on , then the quasi-convexity of for all is equivalent to the -order-sensitivity of for .

Recall that, under appropriate regularity conditions on , the asymmetric piecewise linear loss and the asymmetric piecewise quadratic loss are strictly consistent scoring functions for the -quantile and the -expectile, respectively, and both, as well as , are continuous in their first argument and convex. Hence, Proposition 2.2 yields that both quantiles and expectiles are mixture-continuous.

Steinwart et al. (2014) used Osband’s principle (Osband, 1985) and the assumption of continuity of with respect to the total variation distance to show order-sensitivity. Bellini and Bignozzi (2015) showed that the weak continuity of a functional implies its mixture-continuity. Consequently, one can also derive the order-sensitivity in the framework of Steinwart et al. (2014) directly using only mixture-continuity.

Lambert (2013) showed that it is a harder requirement to have order-sensitivity if is discrete. Then both approaches, invoking Osband’s principle or using mixture-continuity, do not work because the interior of the image of is empty. Moreover, mixture-continuity implies that the functional is constant (such that only trivial cases can be considered). Furthermore, it is proven in Lambert (2013) that for a functional with a discrete image, all strictly consistent scoring functions are order-sensitive if and only if there is one order-sensitive scoring function for .In particular, there are functionals admitting strictly consistent scoring functions that are not order-sensitive, one such example being the mode functional.333Note that due to Proposition 1 in Heinrich (2014), the mode functional is elicitable relative to the class of probability measure containing unimodal discrete measures. Moreover, interpreting the mode functional as a set-valued functional, it is elicitable in the sense of Gneiting (2011, Definition 2). A strictly -consistent scoring function is given by . The main result of Heinrich (2014) is that the mode functional is not elicitable relative to the class of unimodal probability measures with Lebesgue densities.

Let us turn attention to vector-valued functionals now. To understand the monotonicity properties of the expected score of a mixture-continuous elicitable functional , it is useful to consider paths , for . If is elicitable, a classical result asserts that necessarily has convex level sets (Gneiting, 2011, Theorem 6). This implies that the level sets of can only be closed intervals including the case of singletons and the empty set. This rules out loops and some other possible pathologies of . Furthermore, under the assumption that is identifiable as defined below, one can even show that the path is either injective or constant.

###### Definition 2.3 (Identifiability).

Let . An -integrable function is said to be an -identification function for a functional if

 ¯V(T(F),F)=0

for all . Furthermore, is a strict -identification function for if implies for all and for all . A functional is said to be identifiable, if there exists a strict -identification function for .

In line with Gneiting (2011, Section 2.4), one can often obtain an identification function as the gradient of a sufficiently smooth scoring function. However, the converse intuition is not so clear – at least in the higher dimensional setting : Not all strict identification functions can be integrated to a strictly consistent scoring function. They have to satisfy the usual integrability conditions (Königsberger, 2004, p. 185); see also Fissler and Ziegel (2016a, Corollary 3.3) and the discussion thereafter.

###### Lemma 2.4.

Let be convex and be identifiable with a strict -identification function . Then for any , the path , , is either constant or injective.

###### Proof.

Let such that . For any , one has . Since is a strict -identification function for , for all .

Now let and let . Since is a strict -identification function, (and symmetrically .) Assume that . Define , . There are such that and . Hence,

 ¯V(γ(λ),F)=μ¯V(γ(λ),Hλ)+(1−μ)¯V(γ(λ),Hλ′)=0,

and similarly Consequently, , which is a contradiction to the assumption that . This implies that . ∎

###### Proposition 2.5.

Let be convex and be mixture-continuous and surjective. Let be strictly -consistent for . Then for each , and each , there is a continuous path such that , , and the function is decreasing. Additionally, for such that it holds that .

###### Proof.

Let , and . Then there is some with . Define . Clearly, and . Due to the mixture-continuity of , the path is also continuous. The rest follows along the lines of the proof of Nau (1985, Proposition 3). Let . If , there is nothing to show. So assume that . Define , and analogously. Then, for , it holds that . The strict consistency of implies that

 μ¯S(γ(λ′),F)+(1−μ)¯S(γ(λ′),Hλ)=¯S(γ(λ′),Hλ′)<¯S(γ(λ),Hλ′)=μ¯S(γ(λ),F)+(1−μ)¯S(γ(λ),Hλ),

which is equivalent to

 1−μμ(¯S(γ(λ′),Hλ)−¯S(γ(λ),Hλ))<¯S(γ(λ),F)−¯S(γ(λ′),F).

By strict consistency of , the left-hand side is non-negative yielding the assertion. ∎

###### Remark 2.6.
1. Proposition 2.5 remains valid if is only -consistent. Then, we merely have that the function is decreasing, so the last inequality in Proposition 2.5 is not necessarily strict.

2. If one assumes in Proposition 2.5 that is also identifiable, one can use the injectivity of implied by Lemma 2.4 to see that the function is strictly decreasing.

Under certain (weak) regularity conditions, the expected scores of a strictly consistent scoring function has no other local minimum apart from the global one at .

###### Proposition 2.7.

Let be convex and be mixture-continuous and surjective. If is strictly -consistent for , then for all the expected score has only one local minimum which is at .

###### Proof.

Let with . Due to the strict -consistency of , the expected score has a local minimum at . Assume there is another local minimum at some . Then there is a distribution with . Consider the path . Due to Proposition 2.5 the function is decreasing and strictly decreasing when we move on the image of the path from to . Hence cannot have a local minimum at . ∎

### 2.2 Self-calibration

With Proposition 2.5 it is possible to prove that, under mild regularity conditions, strictly consistent scoring functions are self-calibrated which turns out to be useful in the context of M-estimation.

###### Definition 2.8 (Self-calibration).

A scoring function is called -self-calibrated for a functional with respect to a norm444It is straight forward to use a metric instead of a norm on but in this article we only consider , so we did not see any benefit in considering this more general case. See also the discussion before Definition 3.4. on if for all and for all there is a such that for all and

 ¯S(x,F)−¯S(t,F)<δ⟹∥t−x∥<ε.

The notion of self-calibration was introduced by Steinwart (2007)

in the context of machine learning. In a preprint version of

Steinwart et al. (2014),555Available at http://users.cecs.anu.edu.au/~williams/papers/P196.pdf the authors translate this concept to the setting of scoring functions as follows (using our notation):

“For self-calibrated , every -approximate minimizer of , approximates the desired property with precision not worse than . […] In some sense order sensitivity is a global and qualitative notion while self-calibration is a local and quantitative notion.”

In line with this quotation, self-calibration can be considered as the continuity of the inverse of the expected score at the global minimum – and as such, it is a local property of the inverse. This property ensures that convergence of the expected score to its global minimum implies convergence of the forecast to the true functional value. On the other hand, self-calibration of a scoring function is equivalent to the fact that the argmin of the expected score is a well-separated point of minimum in the sense of van der Vaart (1998, p. 45) – as such being a global property of the expected score itself. That means that for any

 inf{¯S(x,F):∥T(F)−x∥≥ε}>¯S(T(F),F).

It is relatively straight forward to see that self-calibration implies strict consistency: Let be -self-calibrated for , , and with . Then for there is a such that .

In the preprint version of Steinwart et al. (2014) it is shown for that order-sensitivity implies self-calibration. The next Proposition shows that the kind of order-sensitivity given by Proposition 2.5 also implies self-calibration for .

###### Proposition 2.9.

Let be convex, be closed, and be a surjective and mixture-continuous functional. If is strictly -consistent for and is continuous for all , then is -self-calibrated for .

###### Proof.

Let , and . Define

 δ:=min{¯S(z,F)−¯S(t,F):z∈A, ∥z−t∥=ε}.

Due to the continuity of , the minimum is well-defined and, as a consequence of the strict -consistency of for , is positive. Let . If , we have, by the definition of , that . Assume that . Then there is a distribution with . Due to Proposition 2.5 there is a continuous path such that , and such that is decreasing in . Moreover, if such that it holds that . Due to the continuity of there is some with . Then we obtain . ∎

We end this subsection about self-calibration by demonstrating its applicability in the context of M-estimation.

###### Theorem 2.10.

Let be an -self-calibrated scoring function for a functional . Then, the following assertion holds for all . If is a sequence of random variables with distribution such that

 supx∈A∣∣ ∣∣1nn∑i=1S(x,Yi)−¯S(x,F)∣∣ ∣∣P⟶0,

then

 argminx∈A1nn∑i=1S(x,Yi)P⟶T(F).
###### Proof.

This is a direct consequence of van der Vaart (1998, Theorem 5.7). ∎

## 3 Order-sensitivity

### 3.1 Different notions of order-sensitivity

The idea of order-sensitivity is that a forecast lying between the true functional value and some other forecast is also assigned an expected score lying between the two other expected scores. If the action domain is one dimensional, there are only two cases to consider: both forecasts are on the left-hand side of the functional value or on the right-hand side. However, if for , the notion of ‘lying between’ is ambiguous. Two obvious interpretations for the multidimensional case are the componentwise interpretation and the interpretation that one forecast is the convex combination of the true functional value and the other forecast.

###### Definition 3.1 (Componentwise order-sensitivity).

A scoring function is called componentwise -order-sensitive for a functional , if for all , and for all we have that:

 For allm∈{1,…,d}:zm≤xm≤Tm(F) or zm≥xm≥Tm(F) ⟹ ¯S(x,F)≤¯S(z,F). (3.1)

Moreover, is called strictly componentwise -order-sensitive for if is componentwise -order-sensitive and if in (3.1) implies that .

###### Remark 3.2.

In economic terms, a strictly componentwise order-sensitive scoring function rewards Pareto improvements666The definition of the Pareto principle according to Scott and Marshall (2009): “A principle of welfare economics derived from the writings of Vilfredo Pareto, which states that a legitimate welfare improvement occurs when a particular change makes at least one person better off, without making any other person worse off. A market exchange which affects nobody adversely is considered to be a ‘Pareto-improvement’ since it leaves one or more persons better off. ‘Pareto optimality’ is said to exist when the distribution of economic welfare cannot be improved for one individual without reducing that of another.” in the sense that improving the prediction performance in one component without deteriorating the prediction ability in the other components results in a lower expected score.

###### Definition 3.3 (Order-sensitivity on line segments).

Let be the Euclidean norm on . A scoring function is -order-sensitive on line segments for a functional , if for all , , and for all the map

 ψ:D={s∈[0,∞):t+sv∈A}→R,s↦¯S(t+sv,F)

is increasing. If the map is strictly increasing, we call strictly -order-sensitive on line segments for .

These two notions of order-sensitivity do not allow for a comparison of any two misspecified forecasts, no matter where they are relative to the true functional value. An intuitive requirement could be ‘the closer to the true functional value the smaller the expected score’, thus calling for the notion of a metric. Since, for a fixed functional and some fixed distribution , we always have a fixed reference point and we have the induced vector-space structure of on , we shall only work with -norms , . Recall that for , for and . If the assertion does not depend on the choice of , we shall usually omit the in the notation. For other choices of , it would be also interesting to replace the norm by a metric in the following definition.

###### Definition 3.4 (Metrical order-sensitivity).

Let . A scoring function is metrically -order-sensitive for a functional relative to the -norm, if for all , and for all we have that

 ∥x−t∥p≤∥z−t∥p ⟹ ¯S(x,F)≤¯S(z,F). (3.2)

If additionally the inequalities in (3.2) are strict, we say that is strictly metrically -order-sensitive for relative to .

Similarly to (strict) consistency, all three notions of (strict) order-sensitivity are preserved when considering two scoring functions that are equivalent.777Two scoring functions are equivalent if there is a positive constant and an -integrable function such that , for all .

The notion of componentwise order-sensitivity corresponds almost literally to the notion of accuracy-rewarding scoring functions introduced by Lambert et al. (2008). Metrically order-sensitivity scoring functions have their counterparts in the field of probabilistic forecasting in effective scoring rules introduced by Friedman (1983) and further investigated by Nau (1985). Actually, the latter paper has also given the inspiration for the notion of order-sensitivity on line segments. It is obvious that any of the three notions of (strict) order-sensitivity implies (strict) consistency. The next lemma formally states this result and gives some logical implications concerning the different notions of order-sensitivity. The proof is standard and therefore omitted.

###### Lemma 3.5.

Let be a functional and a scoring function.

1. Let . If is (strictly) metrically -order-sensitive for relative to the -norm, then is (strictly) componentwise -order-sensitive for .

2. If is (strictly) metrically -order-sensitive for relative to the -norm, then is componentwise -order-sensitive for .

3. If is (strictly) metrically -order-sensitive for relative to the -norm, then is (strictly) -consistent for .

4. If is (strictly) componentwise -order-sensitive for , then is (strictly) -order-sensitive on line segments for .

5. If is (strictly) -order-sensitive on line segments for , then is (strictly) -consistent for .

### 3.2 Componentwise order-sensitivity

Under restrictive regularity assumptions, Lambert et al. (2008, Theorem 5) claim that whenever a functional has a componentwise order-sensitive scoring function, the components of the functional must be elicitable. Moreover, assuming that the measures in have finite support, they assert that any componentwise order-sensitive scoring function is the sum of strictly consistent scoring functions for the components. Lemma 3.6 shows the first claim under less restrictive smoothness assumptions on the scoring function. For many common examples of functionals, the second claim can be shown relaxing the restrictive condition on ; see Proposition 3.7 and the discussion before.

###### Lemma 3.6.

Let be a -dimensional functional with components where . If there is a strictly componentwise -order-sensitive scoring function for , then the components , , are elicitable.

###### Proof.

Fix . Let and such that , for all and . Due to the strict componentwise -order-sensitivity of this implies that . This in turn means that for any the map ,

 (xm,y)↦Sm,z(xm,y):=S(z1,…,zm−1,xm,zm+1,…,zk,y) (3.3)

is a strictly -consistent scoring function for . ∎

If , , are mixture-continuous and elicitable with strictly -consistent scoring functions , then they are order-sensitive according to Lambert (2013, Proposition 2) and Bellini and Bignozzi (2015, Proposition 3.4). Therefore, the sum is strictly componentwise -order-sensitive for . More interestingly, one can establish the reverse of the last assertion. Any strictly componentwise order-sensitive scoring function must necessarily be additively separable. In Fissler and Ziegel (2016a, Section 4), we established a dichotomy for functionals with elicitable components: In most relevant cases, the functional (the corresponding strict identification function, respectively) satisfies Assumption (V4) therein (e.g., when the functional is a vector of different quantiles and / or different expectiles with the exception of the 1/2-expectile), or it is a vector of ratios of expectations with the same denominator, or it is a combination of both situations. Under some regularity conditions, Fissler and Ziegel (2016a, Propositions 4.2 and 4.4) characterize the form of strictly consistent scoring functions for the first two situations, whereas Fissler and Ziegel (2016a, Remark 4.5) is concerned with the third situation. For this latter situation, any strictly consistent scoring function must be necessarily additive for the respective blocks of the functional. And for the first situation, Fissler and Ziegel (2016a, Proposition 4.2) yields the additive form of automatically. It remains to consider the case of Fissler and Ziegel (2016a, Proposition 4.4), that is, a vector of ratios of expectations with the same denominator.

###### Proposition 3.7.

Let be a ratio of expectations with the same denominator, that is, for some -integrable functions , such that for all .888It is no loss of generality to assume that for all in Proposition 3.7. In order to ensure that is well-defined, necessarily for all . However, Assumption (V1) implies that is convex. So if there are such that and then there is a convex combination of and such that . Consequently, either for all or for all , and by possibly changing the sign of one can assume that the first case holds. Assume that is surjective, and that is simply connected. Moreover, consider the strict -identification function , and some strictly -consistent scoring function such that the Assumptions (V1), (S2), (F1), and (VS1) in Fissler and Ziegel (2016a) hold. If is strictly componentwise -order-sensitive for , then is of the form

 S(x1,…,xk,y)=k∑m=1Sm(xm,y), (3.4)

for almost all , where , , are strictly -consistent scoring functions for , , and .

###### Proof.

Due to the fact that for fixed , is a polynomial in , Assumption (V3) in Fissler and Ziegel (2016a) is automatically satisfied. Let be the matrix-valued function given in Osband’s principle; see Fissler and Ziegel (2016a, Theorem 3.2). By Fissler and Ziegel (2016a, Proposition 4.4(i)) we have that

 ∂lhrm(x)=∂rhlm(x), hrl(x)=hlr(x) (3.5)

for all , , where the first identity holds for almost all and the second identity for all . Moreover, the matrix is positive definite for all . If we can show that for , we can use the first part of (3.5) and deduce that for all there are positive functions , where , such that

 hmm(x1,…,xk)=gm(xm)

for all . Then, we can conclude like in the proof of Fissler and Ziegel (2016a, Proposition 4.2(ii)).999The arguments in Fissler and Ziegel (2016a, Proposition 4.2(ii)) use Fissler and Ziegel (2016a, Proposition 3.4). There is a flaw in the latter result which has been pointed out in Brehmer (2017). We present a corrected version of the result in Appendix A.

Fix with and such that . Due to the strict -consistency of defined at (3.3) we have that

 0=ddxl¯Sl,z(xl,F)=∂¯Sl,z(xl,F)=∂l¯S(z1,…,zl−1,xl,zl+1,…,zk,F)

whenever and for all . This means the map is constantly 0. Hence, for all

 ∂r∂l¯S(x,F)=0

whenever . Using the special form of and Fissler and Ziegel (2016a, Corollary 3.3), we have for that

 0=∂r∂l¯S(t,F)=hlr(t)∂r¯Vr(t,F)=hlr(t)¯q(F)

and by assumption . Using the surjectivity of we obtain that for all , which ends the proof. ∎

The notion of componentwise order-sensitivity has an appealing interpretation in the sense that it rewards Pareto improvements of the predictions; see Remark 3.2. The results of Lemma 3.6 and Proposition 3.7 give a clear understanding of the concept including its limitations to the case of functionals only consisting of elicitable components.

Ehm et al. (2016) introduced Murphy diagrams for forecast comparison of quantiles and expectiles. Murphy diagrams have the advantage that forecasts are compared simultaneously with respect to all consistent scoring functions for the respective functional. For many multivariate functionals such as ratios of expectations, the methodology cannot be readily extended because there are no mixture representations available for the class of all consistent scoring functions. Proposition 3.7 shows that when considering only componentwise order-sensitive consistent scoring functions, the situations is different and mixture representations (and hence Murphy diagrams) are readily available for forecast comparison.

### 3.3 Metrical order-sensitivity

###### Lemma 3.8.

Let be convex and be mixture-continuous and surjective. Let be (strictly) -consistent for . Then is (strictly) metrically -order-sensitive for relative to if and only if for all , and we have the implication

 ∥x−t∥=∥z−t∥ ⟹ ¯S(x,F)=¯S(z,F). (3.6)
###### Proof.

Let be metrically -order sensitive for relative to . Let , , such that . Then we have both and .

Assume that (3.6) holds and is (strictly) -consistent. Let with and . Suppose that . If , (3.6) implies that and there is nothing to show. If , we can apply Proposition 2.5. There is a continuous path such that and , and the function is decreasing. Due to continuity there is a such that . Invoking (3.6) it holds that . If is strictly -consistent then the latter inequality is strict. ∎

For a real-valued functional there can be at most one strictly metrically order-sensitive scoring function, up to equivalence. To show this, we use Osband’s principle and impose the corresponding regularity conditions.

###### Proposition 3.9.

Let be a surjective, elicitable and identifiable functional with an oriented strict -identification function . If is convex and are two strictly metrically -order-sensitive scoring functions for such that the Assumptions (V1), (V2), (S1), (F1) and (VS1) from Fissler and Ziegel (2016a) (with respect to both scoring functions) hold, then and are equivalent almost everywhere.

###### Proof.

We apply Osband’s principle, that is, Fissler and Ziegel (2016a, Theorem 3.2) to . Consequently, there is a function such that

 ddx¯S(x,F)=h(x)¯V(x,F) (3.7)

for all , . Due to the strict -consistency of and the orientation of , it holds that . We show that actually . Applying Lemma 3.8, one has that

 ¯S(T(F)+x,F)=¯S(T(F)−x,F) (3.8)

for all ,