1 Introduction and informal statement of main result
Consider a statistical functional
of the random variable
, that is, a mapping , such as the mean or the median. In the theory of forecast validation, a corresponding strict identification function takes the forecast and the realisation of as arguments and its expectation with respect to is zero if and only if equals the true functional value . This defining property makes identification functions a central tool in forecast validation through calibration tests (NoldeZiegel2017), often referred to as backtests in finance, and to forecast rationality (or optimality) tests in economics (EKT2005; DimiPattonSchmidt2019). Furthermore, these functions are fundamental to zero (Z) or generalised method of moments (GMM) estimation
(Huber1967; Hansen1982; NeweyMcFadden1994), where they are often called moment functions or moment conditions. However, their statistical applications go much beyond these two fields and among others, they influence dynamic modelling through generalised autoregressive score (GAS) models (Creal2013), isotonic regression estimates (JordanMuehlemannZiegel2019), or the derivation of anytime valid sequential tests (casgrain2022anytime). A complete understanding of the full class of (strict) identification functions for a given functional is crucial in these applications. Our main contribution, Theorem 4, provides such a full characterisation result.In the jargon of decision theory (Gneiting2011), the quantity of interest attains values in an observation domain , which is equipped with the Borel
algebra. The class of potential probability distributions
of is denoted by . Forecasts are elements of an action domain . Formally, the functional of interest is a potentially setvalued mapping from to , denoted by , where the notation indicates that the values of are subsets of , with the convention that we identify pointvalued functionals such as the mean with the singleton containing this value. For , prime examples for are the mean or the , , where the latter is intervalvalued. Prime examples for multivariate functionals are the meanfunctional in case of multivariate observations (). For univariate observations, examples are multiple quantiles at different levels, the pair (mean, variance) with the natural action domain
or the pair consisting of the quantile and the Expected Shortfall (ES) at the same level with natural action domain , see Examples 2 and 3 for details. To present the formal definition of an identification function , let us introduce the convention that is called integrable if for each of its components the integral exists and is finite for all and . Moreover, we shall use the shorthand for any , , where the integral is understood componentwise.Definition 1 (Identification function and identifiability).

[label=()]

An integrable map is an identification function for a functional if for all and for all

An integrable map is a strict identification function for a functional if for all and for all

A functional is called identifiable if there exists a strict identification function for it.
On the class of distributions on with a finite mean, , the mean is identifiable with strict identification function . Likewise, the expectile, , possesses a strict identification function . On the class of distributions on such that there exists an with , the quantile admits the strict identification function . Functionals failing to be identifiable on practically relevant classes of distributions are the variance and Expected Shortfall. On such classes , both of them violate the selective convex level sets property, which is necessary for identifiability (Osband1985; FisslerHlavinovaRudloff2019Theory).^{1}^{1}1 satisfies the selective convex level sets property of if for any and for any such that it holds that . However, the pairs (mean, variance) and (quantile, ES) turn out to be identifiable with corresponding twodimensional strict identification functions, see Examples 2 and 3.
Regarding the flexibility of the class of identification functions, the following observation is immediate: If is a strict identification function for , it can be multiplied with any valued function of full rank and remains a strict identification function for . Intriguingly, Theorem 4 formally states that, subject to mild regularity conditions, the reverse is also true, and the entire class of strict identification functions is given by
(1) 
Besides its theoretical appeal, this characterisation result opens the way for diverse applications. First, it can be used to optimise power of (conditional) calibration (forecast rationality or optimality) tests studied in NoldeZiegel2017. It is further related to efficient Z or GMMestimation based on conditional moment conditions in the sense of Chamberlain1987 and Newey1993, where the matrix is submerged in the choice of an optimal instrument matrix; see Theorem 3.1 and especially Remark 3.2 in DFZ2020 for details. Based on the choice of an identification function (called score by these authors) as their forcing variable, dynamic GAS models of Creal2013
determine an autoregressive model structure for a corresponding functional of interest that nests classical ARMA and GARCH models for the mean and variance. In these models, the socalled scaling matrix takes the place of the matrix
and, as already called for by Creal2013, this choice “warrants separate inspection”.The following examples discuss interesting applications of our characterisation result in (1) to vectorvalued functionals.
Example 2 (Mean and variance).
The pair (mean, variance) is identifiable on the class of distributions with finite variance with the twodimensional strict identification function
One can use the characterisation result (1) to produce a multitude of other strict identification functions. Motivated by the decomposition of the variance into the difference of the second moment the squared expectation, a comparably intuitive one is
(2) 
which arises by choosing the full rank matrix .
Example 3 (Quantile and ES).
In financial mathematics, ValueatRisk at level () denotes the lower quantile, . Then, the ES at level of a distribution is formally defined as
(3) 
On any subclass of where is finite, e.g. on , there is the following strict identification function for )
where the second component naturally corresponds to a truncated expectation. Applying (1) with the full rank matrix , one obtains the alternative strict identification function
(4) 
The advantage of over is that when evaluating on a discontinuous distribution with , even though the first components of and fail to be an identification function for ,^{2}^{2}2 To obtain a better understanding of identifiability for the possibly setvalued quantile and its lower endpoint , one can distinguish three cases. First, if is strictly increasing and continuous at its quantile, the latter is singletonvalued and is a strict identification function both for and for . Second, if is flat at its setvalued quantile, is still a strict identification function for the setvalued , but it is only a (nonstrict) identification function for the singletonvalued . Third, if is discontinuous at such that (that is, if ), neither nor are identified by . the second component of still vanishes in expectation when plugging in the correct values for and for and . Intuitively, the second component of adds a correction term corresponding to the one on the righthand side of (3). The choice (4) is already utilised by DimiBayer2019 for Zestimation of a joint quantile and ES regression model and naturally shows up in consistent scoring functions for , see FisslerZiegel2016. Finally notice that the is sometimes also defined as the upper average quantile over with . Then, our results apply mutatis mutandis.
2 Formal statement of main result
The assertion of Theorem 4, and in particular its proof, parallels Osband’s principle for consistent scoring functions FisslerZiegel2016, see also Osband1985; Gneiting2011. Up to our knowledge, the assertion has first been stated in the PhD thesis Fissler2017. We need the following assumptions.
Assumption (1).
Let be a convex class of distributions on such that for every there are satisfying where for any set , denotes the interior of and denotes the convex hull of .
Assumption (2).
For every there exists a sequence of distributions that converges weakly to the Diracmeasure and a compact set such that the support of is contained in for all .
Assumption (3).
Suppose that for Lebesgue almost all the maps and are locally bounded. Moreover, suppose that the complement of the set
has dimensional Lebesgue measure zero.
Assumptions (1), (2), and (3) basically correspond to Assumptions (V1), (F1), and (VS1) in FisslerZiegel2016, respectively. Assumption (1) ensures that the class is sufficiently rich, implying in particular the surjectivity of onto and the fact that there are no redundancies in in the sense that all its components are needed; see Remark 5 for some further comments. Assumptions (2) and (3) ensure that can be approximated by a sequence of integrals .
Theorem 4.
Let be a functional with a strict identification function . Then the following two assertions hold:

[label=()]

If is a matrixvalued function with for all , then is also a strict identification function for .

Let satisfy Assumption (1) and let be an identification function for . Then there is a matrixvalued function such that
for all and for all .
If is a strict identification function for and it also satisfies Assumption (1), then additionally for all . If the integrated identification functions and are continuous, then also is continuous, which implies that either for all or for all .
Proof of Theorem 4.
Part (i) is a direct consequence of the linearity of the expectation. For (ii), the proof of the existence of follows along the lines of Theorem 3.2 in FisslerZiegel2016. One just needs to replace with .
If satisfies Assumption (1) as well, one directly obtains that must have full rank on by exchanging the roles of and .
If the expected identification functions are both continuous, the continuity of follows again exactly like in the proof of Theorem 3.2 in FisslerZiegel2016.
For the pointwise assertion (5),
consider such that both and are continuous at . (Due to Assumption (3), this holds for Lebesgue almost all .)
Let be a sequence as specified in Assumption (2). That is, converges weakly to and the supports of all are contained in some compact set .
We claim that and converge to and , respectively, providing the arguments for the former convergence only.
By Skorohod’s theorem, there is a sequence of random variables
on some probability space with distributions
, such that converges to almost surely. By the continuous mapping theorem, converges to almost surely. Since is assumed to be locally bounded and since almost surely, is bounded almost surely. Hence, we can apply the dominated convergence theorem to conclude that . ∎Remark 5.
For part (i) of Theorem 4, no surjectivity assumption is necessary. In fact, the identification functions at (2) and (4) are also strict identification functions for (mean, variance) and , respectively, when considering the action domain . However, it is obvious that part (ii) of Theorem 4 cannot hold without a surjectivity assumption. In fact, would also be a strict identification function for (mean, variance) on the action domain .
On the other hand, also the richness, in particular, the convexity of are needed. Just recall that on the class of symmetric distributions with strictly increasing distribution function, the mean and the median coincide. Hence, both and are strict identification functions, but do not fulfil (5). The reason is that the class of symmetric distributions fails to be convex, unless all distributions have the same mean, in which case the interior of the action domain would be empty under surjectivity.
Remark 6.
One may wonder about the flexibility concerning the dimension of an identification function. Suppose that is a strict identification function for some functional , which takes values in . Clearly, for any matrixvalued function where possibly , the product is an identification function for . If and the rank of is for all , is still a strict identification function. However, will not satisfy Assumption (1), thus containing redundancies (in fact, the easiest way to construct such a is by simply copying some components of ). On the other hand, if , the proof of Theorem 4 (ii) implies that cannot be a strict identification function.
The latter statement can be exemplified by considering the systemic risk measure , which, given a twodimensional observation , it is defined as the of the conditional distribution of , given that exceeds its . Then, the pair is identifiable on the class of absolutely continuous distributions with positive density on with a corresponding strict identification function
see FisslerHoga2021. Due to the argument above, the onedimensional identification function
suggested in BanulescuRaduETAL2021 cannot be a strict identification function for on the class of absolutely continuous distributions with positive density, see FisslerHoga2021.
Acknowledgements
T. Dimitriadis gratefully acknowledges support of the German Research Foundation (DFG) through grant number 502572912 and of the Heidelberg Academy of Sciences and Humanities. J. Ziegel gratefully acknowledges support of the Swiss National Science Foundation. We are very grateful to Jana Hlavinová for a careful proofreading and valuable feedback on an earlier version of this paper.