In many fields of human endeavor, it is desirable to make forecasts for an uncertain future. Hence, forecasts should be probabilistic, presented as probability distributions over possible future outcomes (Gneiting and Katzfuss, 2014). Nonetheless, many practical situations require forecasters to issue single-valued point forecasts. In this situation, a directive is required about the specific feature or functional of the predictive distribution that is being sought, or about the loss (or scoring) function that is to be minimized (Gneiting, 2011a; Ehm et al., 2016). Examples of functionals include the mean, median, a quantile or expectile, with the latter recently attracting interest in risk management (Bellini and Di Bernardino, 2017). Examples of scoring functions include the squared error scoring function and absolute error scoring function . In the case that the directive is in the form of a statistical functional, it is critical that any scoring function used is appropriate for the task at hand. Ideally, a point forecast sampled from one’s predictive distribution by using the requested functional should also minimize one’s expected score. That is, the scoring function should be consistent for the functional (Gneiting, 2011a). It is well-known that the squared error scoring function is consistent for the mean and that the absolute error scoring function is consistent for the median. Within this framework, predictive performance is assessed by computing the mean score over a number of forecast cases.
This paper studies Huber loss, and asymmetric variants of Huber loss (Definition 3.2
), as a scoring function of point forecasts, along with its associated functional. The classical Huber loss function(Huber, 1964) with positive tuning parameter is given by
where is a point forecast and the corresponding realization. It applies a quadratic penalty to small errors and a linear penalty to large errors and is an intermediary between the squared error and absolute error scoring functions. Huber loss is used by the Australian Bureau of Meteorology to compare predictive performance of temperature and wind speed forecasts with a view to streamlining forecast production (Foley et al., 2019), and is described to weather forecasters in that organization as “a compromise between the absolute error and the squared error, in an attempt to use the benefits of both of these.”
For a given predictive predictive distribution, we call the set of point forecasts that minimize expected Huber loss the Huber mean of that distribution. The Huber mean is an intermediary between the median and the mean with some appealing properties. It can be described as the midpoint of the ‘central interval’ of length of a distribution, where is the tuning parameter of the corresponding Huber loss function. The Huber mean, unlike the mean, is not dependent on the behavior of the distribution at its tails. At the same time, it accounts for more behavior in the vicinity of the center of the distribution than the median. It is therefore a robust measure of location for a distribution. More generally, the Huber functional gives the minimizers of expected asymmetric Huber loss, and is an intermediary between some -quantile and -expectile, as was also noted by Jones (1994) from the perspective of M-estimation. Basic properties of Huber means and Huber functionals are discussed in Section 2, many of which can be traced to Huber (1964) in the classical symmetric case.
In the context of point forecasting, an essential property of a statistical functional is that it is elicitable; that is, that the functional generates precisely the set of minimizers of some expected score (Lambert et al., 2008). The Huber functionals is shown to be elicitable for classes of probability distributions on (or subintervals of ) under weak regularity assumptions (Theorem 3.5). The class of scoring functions that are consistent for the Huber functional is also characterized, being parameterized by the set of convex functions (also Theorem 3.5). Edge cases of this characterization recover the general form of scoring functions that are consistent for quantiles (Gneiting, 2011b; Thomson, 1979) and expectiles (Gneiting, 2011a).
Determining which consistent scoring function to use is non-trivial since in practice this choice can influence forecast rankings (Murphy, 1977; Schervish et al., 1989; Merkle and Steyvers, 2013) as illustrated in Section 4.1. In the case of quantiles and expectiles, Ehm et al. (2016) gave clarity to this issue by showing that each consistent scoring function for those functionals admits a mixture representation; that is, can be expressed as a weighted average of elementary scoring functions. Likewise, each consistent scoring function for the Huber functional can be expressed as a weighted average of elementary scoring functions that are consistent for the Huber functional (Theorem 4.3). Again, the analogous results for quantiles and expectiles are recoverable as edge cases of this theorem. The Huber functional and its associated elementary scores arise naturally in optimal decision rules for investment problems with fixed up-front costs, where profits and losses are capped (Section 4.3). Such models are intermediaries between the classical simple cost–loss decision model (e.g. Richardson (2000); Wolfers and Zitzewitz (2008)) on the one hand, and investment decision models with no bounds on profits or losses (Ehm et al., 2016; Bellini and Di Bernardino, 2017) on the other. The mixture representation, along with the economic interpretation of elementary scoring functions and Murphy diagrams, aids interpreting forecast rankings in empirical situations (Ehm et al. (2016); Section 4.4). Applications include, for example, selecting a consistent scoring function that emphasizes predictive performance at the extremes of a variable’s range (Taggart, 2021).
Finally, Section 5 explores the use of Huber loss as a robust scoring function for point forecasts targeting the mean functional in situations where forecasts are judged against observation sets that are contaminated, say, by faults in the observation measurement process. While the squared error scoring function is consistent for the mean, the presence of contaminated observations can grossly distort forecast rankings based on it. The Huber loss scoring function provides a feasible alternative.
Conclusions are presented in Section 6 and proofs of the main results are given in the appendix.
2 Quantiles, expectiles and Huber functionals
To begin, we establish some notation. We work in a setting where forecasts are made for some quantity, where the range of possible outcomes belongs to some interval . Forecasts of the quantity can be in the form of a predictive distribution on or of a point forecast in . The realization (or observation) of the quantity will usually be denoted by .
Let denote the class of probability measures on the Borel–Lebesgue sets of and denote the subset of probability measures on . For simplicity, we do not distinguish between a measure in and its associated cumulative density function (CDF) . For in , write
to indicate that a random variablehas distribution ; that is, whenever . Throughout, the notation indicates that the expectation is taken with respect to .
The power set of a set will be denoted . For a real-valued quantity , we denote by the quantity . The partial derivative with respect to the th argument of a function is denoted .
Whenever , define the ‘capping function’ by
That is, is capped below by and above by . Note that .
In many contexts users or issuers of forecasts want a relevant point summary of a predictive distribution . This can be generated by requesting a specific statistical functional of . Given an interval and some space of probability distributions in , a statistical functional (or simply a functional) on is a mapping (Horowitz and Manski, 2006; Gneiting, 2011a). Two important examples are quantiles and expectiles.
Suppose that and . The -quantile functional is defined by
whenever . For any , is a closed bounded interval of . The two endpoints only differ when the level set contains more than one point, so typically the functional is single valued. The median functional arises when . If is an -quantile of and is continuous at then . Figure 1 illustrates the quantiles (the median) and , where
is the exponential distribution. The aforementioned property is illustrated in the figure via the vertical dashed line segments, whose lengths are in the ratio.
Given an interval , let denote the space of probability measures
with finite first moment. The-expectile functional is defined by
whenever . It can be shown there is a unique solution to the defining equation, so expectiles are single valued. Expectiles were introduced by Newey and Powell (1987) in the context of least squares estimation and have recently attracted interest in financial risk management (Bellini and Di Bernardino, 2017). Expectiles share properties of both expectations as well as quantiles, and nests the mean functional . Using integration by parts, one can show that if and only if
The latter equation gives a geometric interpretation of the -expectile of . It is the unique point such that the )-weighted area of the region bounded by and on the interval is equal to the -weighted area of the region bounded by and on the interval . Figure 1 illustrates this interpretation, via the areas of the shaded regions, for the expectiles (i.e. mean) and , where is the exponential distribution.
Equation (2.1) can be re-written as
By modifying the parameters of the capping function , we introduce another functional.
Suppose that , , and that is an interval. Then the Huber functional is defined by
whenever . In the case when , we simplify notation and write for . The special case is called a Huber mean.
We have named the Huber functional for Peter Huber, whose loss function
(Huber, 1964) also bears his name. The connection between the Huber functional and Huber loss will be made explicit in Section 3. Since the Huber functional is an example of a generalized quantile (Breckling and Chambers, 1988; Jones, 1994; Bellini et al., 2014), may also be called a Huber quantile of . We note here that if and only if , where is given by
The function is an identification function (Gneiting, 2011a, Section 2.4) for , and will be used to establish important properties of the Huber functional.
As with expectiles, a routine calculation using integration by parts shows that if and only if
This gives a geometric interpretation of the Huber functional as the set of points where the -weighted area of the region bounded by and on equals the -weighted of the region bounded by and on . In the case when , the two areas are equal. This is illustrated for the exponential distribution in Figure 1 for (when and ) and in Figure 2 for (when , and ).
In light of the corresponding geometric interpretations of quantiles and expectiles, and also the similarity between Equations (2.2) and (2.3), it should come as no surprise that -quantiles and -expectiles are nested as edge cases in the family of Huber means. The following proposition makes this precise and lists several other basic properties of the Huber functional. In what follows, denotes the closure in of the level set , and denotes the smallest closed interval of that contains the support of the measure .
Suppose that , , , is an interval and .
Then is a nonempty closed bounded subinterval of contained in .
If for some , then there exists in such that and .
If there exists in such that for some and satisfying , then where .
If has finite first moment then
If and whenever
Part (1) is similar to Proposition 1(a) of Bellini et al. (2014), whilst parts (4), (5) and (6) were noted, in the case of finite discrete distributions when and , by Huber (1964). The proof is given in the appendix.
Part (6) can be interpreted as saying that the Huber functional only depends on the values of the CDF away from its tails. In situations where the tail of a predictive distribution is difficult to model, but a point summary describing its broad center is desired, this property is useful. In particular, the Huber functional is invariant to the modification of outside the interval . In contrast, modification of the tails of will generally change its mean and expectile values, whilst quantile values are invariant to modifications of anywhere apart from at the quantile.
Parts (2) and (3) specify conditions on for when is multivalued. A corollary is that if each level set of on has length not exceeding then is single valued for every in . Figure 2 illustrates a distribution for which is multivalued whenever . In this particular case,
has a symmetric bi-modal probability density function (PDF), and also the property thatwhenever .
Note that while is in some sense an intermediary between and , the right-hand side of Figure 1 illustrates that the Huber quantile does not always lie between the corresponding quantile and expectile.
3 Scoring functions, consistency and elicitability
In this section we discuss scoring functions and their relationship to point forecasts and functionals. Two key concepts are those of consistency and elicitability. How these concepts relate to the Huber functional is the subject of Theorem 3.5, which is the first major result of this paper.
3.1 Scoring functions and Bayes’ rules
Suppose that . A function is a called a scoring function if for all with whenever . The scoring function is said to be regular if (i) for each the function is measurable, and (ii) for each the function is continuous, with continuous derivative whenever .
The score can be interpreted as the loss or cost accrued when the point forecast is issued and the observation realizes. Examples of scoring functions include the squared error scoring function , the absolute error scoring function and the zero–one scoring function , for some positive . Only first two of these are regular, whilst the zero–one scoring function fails to be regular on account of its discontinuity when . The measurability condition (i) is a technical condition that is satisfied by most (if not all) scoring functions that arise in practice.
Huber loss (2.4) gives rise to the regular scoring function . We introduce a more general version.
Suppose that , and . The generalized Huber loss function is defined by
The classical Huber loss function given by Equation (2.4) is 2. The same generalization is used by Zhao et al. (2019) for robust expectile regression. Figure 2 shows the graph of . Note that is differentiable on , with derivative
Generalised Huber loss gives rise to the regular scoring function .
Given a scoring function , a forecast system that generates point forecasts can assessed by computing its mean score , where
over a finite set of forecast cases with corresponding observations . In this framework, if a number of competing forecast systems are being compared then the one with the lowest mean score is the best performer. Thus, given a scoring function and predictive distribution , an optimal point forecast is any in that minimizes the expected score; that is,
It has long been known that the Bayes’ rule under the squared error scoring function is the mean of , and under the absolute error scoring function is any median of . The Bayes’ rule under the asymmetric piecewise linear scoring function
is a quantile (e.g. Ferguson (1967)), whilst the Bayes’ rule under the asymmetric quadratic scoring function
To find the Bayes’ rule under the generalized Huber loss scoring function , we look for solutions to the equation . If interchanging differentiation and integration can be justified then . Using Equation (3.1), one obtains , where is the identification function given by (2.5). This implies that . So, at least formally, the Bayes’ rule under generalized Huber loss is the corresponding Huber functional of . A precise statement will be given in the next subsection.
3.2 Consistency and elicitability
Whenever a point forecast request specifies what functional of the predictive distribution is being sought, the scoring function used to evaluate the point forecast should be appropriate for that functional.
for all probability distributions in , all in and all in . The functional is said to be strictly consistent relative to the class if it is consistent relative to the class and if equality in (3.4) implies that .
Evaluating point forecasts with a strictly consistent scoring function rewards forecasters who give truthful point forecast quotes from carefully considered predictive distributions. This is because the requested functional of the predictive distribution coincides with the optimal point forecast (or Bayes’ rule).
The families of consistent scoring functions for quantiles and expectiles each have a standard form. Subject to slight regularity conditions, a scoring function is consistent for the quantile functional if and only if is of the form
where is a non-decreasing function (Gneiting, 2011b; Thomson, 1979; Saerens, 2000). Moreover, if is strictly increasing then is strictly consistent. The standard asymmetric piecewise linear scoring function (3.2) for quantiles (which includes, up to a multiplicative constant, the absolute error scoring function for the median) is recovered from Equation (3.5) with the choice .
Subject to standard regularity conditions, a scoring function is consistent for the expectile functional if and only if is of the form
where is a convex function with subgradient (Gneiting, 2011a). Moreover, if is strictly convex then is strictly consistent. The standard asymmetric quadratic scoring function (3.3) for expectiles (including, up to a multiplicative constant, the squared error scoring function for the mean) is recovered from (3.6) by taking . When , the function of (3.6) is known as a Bregman function.
We will show that consistent scoring functions for the Huber functional also have a standard form. Before doing so, we introduce a critical concept related to the evaluation of point forecasts.
(Lambert et al., 2008) A statistical functional is said to be elicitable relative to a class of probability distributions if there exists a scoring function that is strictly consistent for relative to .
For example, quantiles are elicitable relative to the class , while expectiles are elicitable relative to the class of distributions in with finite first moment (Gneiting, 2011a). It is worth noting that some statistical functionals are not elicitable, including the sum of two distinct quantiles and conditional value-at-risk, a risk measure used in finance (Gneiting, 2011a).
We turn now to the Huber functional. The main thrust (subject to appropriate regularity conditions) is that the Huber functional is elicitable, and that is consistent for if and only if is of the form
where is a convex function with subgradient . Moreover, is strictly consistent if is strictly convex. The generalized Huber loss scoring function arises from Equation (3.7) with the choice . The following gives a precise statement.
Suppose that is an interval and that , and .
The Huber functional is elicitable relative to the class of probability measures when is bounded or semi-infinite, and elicitable relative to the class of probability measures with finite first moment when .
Suppose that is convex on . Then the function , defined by Equation (3.7), is a consistent scoring function for the Huber functional relative to the class of probability measures for which both and exist and are finite. If, additionally, is strictly convex then is strictly consistent for relative to the same class of probability measures.
Suppose that the scoring function is regular. If is consistent for the Huber functional relative to the class of probability measures in with compact support, then is of the form (3.7) for some convex function . Moreover, if is strictly consistent then is strictly convex.
The proof is given in the appendix.
The general form (3.7) for the consistent scoring functions of the Huber functional yields, as edge cases, the general form for the consistent scoring functions of expectiles and quantiles. To be precise, let denote the scoring function given by (3.7) when , and let and denote the consistent scoring functions of Equations (3.6) and (3.5) respectively. The relationship between and is straightforward via pointwise limit
For the other end of the spectrum we consider the rescaled consistent scoring function , and obtain the pointwise limit
where is nondecreasing because is convex. Importantly, the relevant regularity conditions ensure that every non-decreasing function in the representation (3.5) is the subderivative of some suitable convex .
The consistent scoring functions for the Huber functional thus show a mixture of the properties of the consistent scoring functions for quantiles and expectiles. Focusing on the functional for positive , the only consistent scoring function (up to a multiplicative constant) on that only depends on the difference between the forecast and observation is the classical Huber loss scoring function . This is because the only Bregman function (up to a multiplicative constant) that has the same property for is the squared error scoring function (Savage, 1971). Hence, apart from multiples of classical Huber loss, other consistent scoring functions for on penalize under- and over-prediction asymmetrically. One such example is the exponential family
4 Mixture representations and Murphy diagrams
The main theoretical tool presented in this section is the mixture representation for consistent scoring functions of the Huber functional (Theorem 4.3). Mixture representations were introduced for quantiles and expectiles by Ehm et al. (2016) and have several very useful applications, including providing insight into forecast rankings.
4.1 Ranking of forecasts
Recall from Section 3.1 that point forecasts from two competing forecast systems and can be ranked by calculating their mean scores and over a finite number of forecast cases for some scoring function . If the forecast cases are independent, a statistical test for equal predictive performance can be based on the statistic , where
for forecasts and and corresponding realizations . Corresponding
-values are computed and if the null hypothesis is rejected thenis preferred if and is preferred otherwise (Gneiting and Katzfuss, 2014, Section 3.3). Unfortunately, forecast rankings and the results of hypothesis tests can depend on the choice of consistent scoring function (Ehm et al., 2016, pp. 506, 515–516), as we now illustrate.
Two forecast systems, BoM and OCF, produce point forecasts for the daily maximum temperature at Sydney Observatory Hill. The OCF system generates forecasts from a blend of bias-corrected numerical weather prediction forecasts. The BoM forecast is issued by meteorologists who have access to various information sources, including OCF. We consider forecasts for the period July 2018 to June 2020 with a lead time of one day. See Figure 3 for a sample time series of BoM and OCF forecasts with observations.
Suppose that these forecasts are targeting the Huber mean , and make the simplifying assumption that successive forecast cases are independent. If the consistent scoring function is used, then the mean score for BoM is lower than the mean score for OCF, and with a -value of the null hypothesis of equal predictive performance is rejected at the 5% significance level in favor of BoM forecasts. However, if the consistent scoring function defined by Equation (3.10) is used, then OCF has the lower mean score, albeit with a -value of that upholds the null hypothesis.
4.2 Mixture representations
In Section 3.2 it was seen that the class of consistent scoring functions for each quantile, expectile and Huber functional is very large, being parametrized either by the set of nondecreasing functions or by the set of convex functions. The following results show that this apparent multitude can, in a certain sense, be reduced to a one-parameter family of so-called elementary scoring functions.
In general, the choice of function in the representations (3.6) and (3.7) is not unique. To facilitate precise mathematical statements, a special version of will be chosen. Let denote the class of all left-continuous non-decreasing functions on , and let denote the class of all convex functions with subgradient in . This last condition will be satisfied if is chosen to be the left-hand derivative of . Denote by the class of scoring functions of the form (3.5) such that , by the class of scoring functions of the form (3.6) such that , and by the class of scoring functions of the form (3.7) such that . For most practical purposes, , and can be identified with the class of consistent scoring functions for the respective functional on .
The following important result on the representation of scoring functions that are consistent for the quantile and expectile functionals is due to Ehm et al. (2016).
(Ehm et al., 2016, Theorem 1)
Every member of the class has a representation of the form
and is a non-negative measure. The mixing measure is unique and satisfies whenever , where is the nondecreasing function in the representation (3.5). Furthermore, .
Every member of the class has a representation of the form
and is a non-negative measure. The mixing measure is unique and satisfies whenever , where is the left-hand derivative of the convex function in the representation (3.6). Furthermore, .
Both integral representations (4.2) and (4.4) hold pointwise. The functions defined by (4.3) and (4.5) are called elementary scoring functions for the quantile and expectile functionals respectively. Thus Theorem 4.2 essentially states that each scoring function that is consistent for a quantile or expectile functional can be expressed as a weighted average of corresponding elementary scoring functions. The analogous result for Huber functionals is new and stated below.
Every member of the class has a representation of the form
and is a non-negative measure. The mixing measure is unique and satisfies whenever , where is the left-hand derivative of the convex function in the representation (3.7). Furthermore, .
The proof is given in the appendix and is a simple adaptation of the proof for quantiles and expectiles.
Each function of Theorem 4.3 is called an elementary scoring function for the Huber functional, and also belongs to , as can be seen via Equation (3.7) with the choice and . The mixture representation of Equation (4.6) holds pointwise. Moreover, when , the mixture representations for the consistent scoring functions of expectiles and quantiles emerge as edge cases of Theorem 4.3 by taking limits as and as and using the dominated convergence theorem. Details are given in Remark A.1.
4.3 Economic interpretation of elementary scoring functions
Ehm et al. (2016) showed how the elementary scoring functions for quantiles have a natural economic interpretation related to binary betting and the classical simple cost–loss decision model (e.g Richardson (2000); Wolfers and Zitzewitz (2008)). On the other hand, the elementary scoring functions for expectiles arise naturally in simple investment decisions where profits attract taxation and losses tax deduction, possibly at different rates (Ehm et al., 2016; Bellini and Di Bernardino, 2017). The elementary scoring functions for the Huber functional also admit an economic interpretation. It is the loss, relative to actions based on a perfect forecast, of an investment decision with fixed costs, possibly differential tax rates for profits versus losses, and where profits and losses are capped. This represents an intermediary position between the interpretation for quantiles (where economic losses, if they occur, are fixed irrespective of how near or far the forecast is to the realization) and that for expectiles (where there is no cap on profits or on losses). To illustrate, we give two examples. The first is an adaptation of the interpretation for the elementary scoring functions of expectiles presented by Ehm et al. (2016), while the second shows how the Huber functional and its elementary functions can arise in the context of investment decisions based on weather forecasts.
Suppose that Alexandra considers investing a fixed amount in a start-up company in exchange for an unknown future amount of the company’s profits or losses. Additionally, Alexandra takes out an option to set a limit on losses she could incur but which also imposes a limit on the profits she could receive. Alexandra will make a profit if and only if , and so adopts the decision rule to invest if and only if her point forecast of exceeds . Her pay-off structure is as follows:
If Alexandra refrains from the deal, her pay-off will be 0, independent of the outcome .
If Alexandra invests and realizes then her payout is negative at . Here is the monetary loss, bounded by , and the factor accounts for Alexandra’s reduction in income tax with representing the deduction rate.
If Alexandra invests and realizes then her pay-off is positive at , where denotes the tax rate that applies to her profits.
The top matrix in Table 1 shows Alexandra’s pay-off under her decision rule. The positively-oriented pay-off matrix can be reformulated as a negatively oriented regret matrix, by considering the difference between the pay-off for an (hypothetical) omniscient investor who has access to a perfect forecast and the pay-off for Alexandra. For example, if and realizes, then the omniscient investor’s pay-off is while Alexandra’s pay-off is 0, and so Alexandra’s regret is . The bottom matrix of Table 1 is Alexandra’s regret matrix, which up to a multiplication factor is the elementary score . So to minimize regret, Alexandra should invest if and only if , where , is Alexandra’s predictive distribution of the future value of the investment and . The point forecast arises if profits and losses are capped by the same value and if the rates and are equal.
Hannah runs a business selling ice creams from a mobile cart at a sports stadium. Historically, there is an approximately linear relationship between the volume of ice cream sales on any given afternoon and the observed daily maximum temperature, so that the profit from sales is modeled by , where is the observed daily maximum temperature, and . Additionally, for some positive , since total sales are limited by cart capacity, while any unsold units can be sold at a later date. If Hannah chooses to sell ice creams on any given afternoon, she must also pay a fixed cost (staff wages and stadium fees). If model assumptions are correct, Hannah will make a profit if and only if . So she adopts the decision rule to sell ice creams on any given afternoon if and only if her point forecast of the maximum temperature exceeds the decision threshold , where . Her pay-off structure is as follows.
If Hannah does not sell ice creams then her pay-off is 0.
If Hannah sells ice creams and then her profit after tax is , where denotes the tax rate. Her profit can be rewritten as .
If Hannah sells ice creams and then her loss after tax deductions is , where denotes the deduction rate, and losses are capped by since unsold ice creams go back into storage. Her loss can be rewritten as .
As with Example 4.4, these outcomes can be converted to a regret matrix, which up to a multiplication factor is the elementary score where . Consequently, her optimal decision rule is to sell ice creams if and only if , where , , is her predictive distribution of the maximum temperature and .
The essential features of Example 4.5 also arise in the context of rainfall storage and water trading. Any profits made by selling harvested water are capped by storage capacity. The predicted volume of water that is collected from any rainfall event can be modeled by , where is the predicted rainfall at a representative point within the catchment, is catchment initial loss and is determined by catchment size and continuing loss.
4.4 Forecast dominance, Murphy diagrams and choice of consistent scoring function
We return to the problem of forecast rankings with the notion of forecast dominance (Ehm et al., 2016, Section 3.2). We say that forecast system A dominates forecast system B for point forecasts targeting a specific Huber functional if the expected score of point forecasts from A is not greater than the expected score of point forecasts from B, for every consistent scoring function. In practice this is impossible to check directly because the family of consistent scoring functions, parameterized by , is very large. However, by the mixture representation of Theorem (4.3), one need only test for dominance over the family, parametrized by , of elementary functions. In empirical situations, this is further reduced to checking forecast dominance for finitely many . In what follows, we consider tuples consisting of the th point forecast from systems A and B along with the corresponding observation .
Suppose that , and . The forecast system empirically dominates for predictions targeting if
whenever and in the left-hand limit as , where .
To see why, note that the score differential for the th forecast case is piecewise linear and right-continuous, and is zero unless lies between and . The only possible discontinuities are at and , and the only possible changes of slope are at , and .
An empirical check for forecast dominance is aided with the use of a Murphy diagram (Ehm et al., 2016, Section 3.3), which is a plot showing the graph of
for each forecast source, computed at each of the points of Corollary 4.6. The top left of Figure 3 presents the Murphy diagram for three different forecasts targeting the Huber mean of the daily maximum temperature at Sydney Observatory Hill (July 2018 to June 2020). The OCF and BoM forecasts were discussed in Example 4.1. For any given day, the Climate forecast is the mean of 46 observations, sampled from the previous 15 days and from a 31 day period this time last year centered on the day in question. A lower mean score is better.
The graph in the top right of Figure 3 represents forecast performance as a skill score with respect to two reference forecasts: the perfect forecast (skill score = 1) and the Climate forecast (skill score = 0). The difference in mean elementary scores between OCF and BoM forecasts is presented in the bottom left, with pointwise 95% confidence intervals. Neither of these forecasts dominates the other.
Returning to Example 4.5, if Hannah’s decision rule is to sell ice creams if and only if the point forecast exceeds C, then Hannah should base her decisions on the BoM forecast, since its mean elementary score, which is proportional to economic regret, is lowest (see the top left of Figure 3 where ). But if her fixed investment costs changed, then so would her decision threshold , and the Murphy diagram indicates which forecast system historically performed better at the new threshold.
The mixture representation and Murphy diagram also gives insight into why the two different scoring functions of Example 4.1 lead to different forecast rankings. The classical Huber loss scoring function is obtained from Equation (3.7) with the choice . The corresponding mixing measure is , implying that every elementary scoring function in the mixture representation (4.6) is weighted equally, and also that the area underneath each graph in the Murphy diagram (top left of Figure 3) is twice the mean Huber loss for that forecast system. On the other hand, the exponential scoring function is obtained from Equation (3.7) with the choice . In this case and so mean elementary scores in the corresponding mixture representation are weighted heavily for higher values of . Hence when scored by , a slight over-forecast of C by BoM on 19 December 2019 (OCF forecast C and the observation was C) was penalized substantially more heavily than the OCF under-forecast, resulting in a higher mean score