Anytime-valid sequential testing for elicitable functionals via supermartingales

We design sequential tests for a large class of nonparametric null hypotheses based on elicitable and identifiable functionals. Such functionals are defined in terms of scoring functions and identification functions, which are ideal building blocks for constructing nonnegative supermartingales under the null. This in turn yields anytime valid tests via Ville's inequality. Using regret bounds from Online Convex Optimization, we obtain rigorous guarantees on the asymptotic power of the tests for a wide range of alternative hypotheses. Our results allow for bounded and unbounded data distributions, assuming that a sub-ψ tail bound is satisfied.



page 22

page 24


How can one test if a binary sequence is exchangeable? Fork-convex hulls, supermartingales, and Snell envelopes

Suppose we observe an infinite series of coin flips X_1,X_2,…, and wish ...

On null hypotheses in survival analysis

The conventional nonparametric tests in survival analysis, such as the l...

Game-theoretic Formulations of Sequential Nonparametric One- and Two-Sample Tests

We study the problem of designing consistent sequential one- and two-sam...

Anytime-valid Confidence Intervals for Contingency Tables and Beyond

E variables are tools for designing tests that keep their type-I error g...

Can Bayes Factors "Prove" the Null Hypothesis?

It is possible to obtain a large Bayes Factor (BF) favoring the null hyp...

Nonparametric Tests for Bivariate Stochastic Dominance without Continuity Assumptions

The use of Kolmogorov-Smirnov-type statistics for testing stochastic dom...

Admissible anytime-valid sequential inference must rely on nonnegative martingales

Wald's anytime-valid p-values and Robbins' confidence sequences enable s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We design sequential tests and confidence sequences for a large class of nonparametric null hypotheses based on elicitable and identifiable functionals. Such functionals include moments, quantiles, expectiles, and many other examples, all of which can be tested using the approach developed here. The null hypotheses that we cover are highly composite and nonparametric; for instance, the null could consist of

all distributions whose median, say, is a given value. Our tests are sequential, or anytime valid

, in the sense that data is observed sequentially through time, and at each point in time the decision to stop or continue may depend on all available data without compromising Type-I error guarantees. We also obtain guarantees on the power of our tests with respect to large composite nonparametric alternative hypotheses.

The basic mechanism we use to construct anytime valid tests rests on the notion of test (super-)martingale due to MR2849911

. The idea is simple but powerful: a test statistic that is a nonnegative supermartingale if the null hypothesis is true can only reach large values with small probability. This can be quantified using Ville’s inequality. Thus if one rejects the null only when a sufficiently large value of the test statistic has been observed, Type-I error control is ensured.

The first contribution of our work is the observation that elicitable and identifiable functionals in the sense of LambertPennockETAL2008; gneiting2011making; FisslerZiegel2016 are ideal building blocks for constructing test supermartingales. Combined with a construction known as predictable mixing, one immediately obtains large families of test supermartingales which can be used as possible test statistics for the null hypothesis defined by a particular elicitable or identifiable functional.

The predictable mixing construction can be interpreted in terms of betting or trading. Finding a useful predictable mixture corresponds to determining a profitable trading strategy. The supermartingale condition under the null ensures that profits are limited if the null is true. But if the null is false, it may be possible to “bet against the null” in a way that leads to large profits and, hence, reject the null. Doing so requires two things. First, in order to bet against the null, one must specify a suitable distribution to bet on. Second, given this distribution, one must find a strategy that is likely to be profitable.

The second contribution of our work is to address these two points at once by making use of ideas from online convex optimization (OCO). We demonstrate how off-the-shelf algorithms can be used to produce strong trading strategies. This leads to powerful test supermartingales given as predictable mixtures of the basic set of test supermartingales constructed from the elicitable or identifiable functional used to specify the null hypothesis. A major advantage of this approach is that these algorithms come with performance guarantees in the form of regret bounds. These regret bounds translate into rigorous guarantees on the power of the resulting sequential test under a wide variety of alternative hypotheses.

Sequential testing goes back to MR13275. A large body of literature on the subject exists, and martingale techniques have played an important role from the beginning; see Appendix D in wau_ram_20 for an overview. The concept of a test martingale was introduced in MR2849911, and there has recently been a number of papers related to this circle of ideas, for instance howard2018timeuniform; howard2020timeuniform which derive time-uniform confidence sequences and concentration bounds. In particular, the closely related notion of e-variables and e-processes have received significant attention; see e.g. gru_hei_koo_19; 10.1214/20-AOS2020; xu_wang_ramdas_21; ram_ruf_lar_koo_20; MR4364897 as well as Remark 2.3 below. Most closely related to our paper is the work of wau_ram_20

, which develops anytime valid confidence sequences for the mean of a sequence of bounded random variables. That paper makes use of the same betting perspective, which enables the authors to obtain powerful confidence sequences. It also discusses various related strands of literature and the history of the subject. However, the authors do not consider other functionals beyond the mean, and they rely on the boundedness of the data in an essential way. Our work generalizes both of these points. Moreover, they do not make use of OCO to obtain regret bounds which then translate into statements about power. Our use of regret bounds is reminiscent of

MR4364897, where regret bounds are used to derive power guarantees for the particular problem of testing exchangeability of binary sequences. Another paper related to ours is 10.1093/biomet/asab047, which treats the problem of probability forecasting. The authors rely on the betting analogy to construct sequential tests for the statistical significance of score differences of competing forecasts.

The concepts of elicitability and identifiability go back to the PhD thesis of Osband1985. However, the term elicitability was coined later (LambertPennockETAL2008), and it was popularized by gneiting2011making; steinwart2014elicitation; frongillo2015elicitation; FisslerZiegel2016. However, prior to our work, no systematic approach to the sequential testing problem for elicitable and identifiable functionals had been developed. Thus our paper contributes to this literature as well, the essential point being the link between test supermartingales and the concepts of elicitability and identifiability.

The paper is organized as follows. In Section 2 we review the definition of test supermartingales and how they can be used to construct powerful anytime valid tests via predictable mixing. In Section 3 we discuss elicitability and identifiability, both as a way of specifying nonparametric null hypotheses and as the basis for constructing test supermartingales. Importantly, we show how sub- tail bounds can be leveraged to handle unbounded data. Section 4 discusses how regret bounds from online convex optimization lead to statements about asymptotic power of the tests. In Section 5 we briefly discuss the related issue of confidence sequences. Section 6 contains a simulation study illustrating the techniques developed in this paper.

2 Sequential testing via supermartingales

We consider a sequential testing environment in which a discrete-time stochastic process , taking values in some measurable space , is observed sequentially through time. The process is called the data generating process. Concrete examples of data generating processes include patient data collected from clinical trials or daily profit and loss values of a trading strategy. The filtration generated by the data generating process is denoted where is the information set generated by the data collected until time . We let denote the trivial -algebra.

A statistical hypothesis is a collection , where

is the set of all probability distributions of the data generating process. Thus an element

is a distribution over the entire sequence . The hypothesis encodes the belief that the realized data was governed by one of the distributions . A (sequential) test for a given null hypothesis is defined as an -stopping time that specifies the time at which is rejected. The requirement that be a stopping time means that the decision whether to stop and reject , or to continue and observe more data, is only based on data available at the time the decision is made. Stopping times are allowed to take the value infinity, and this corresponds to the possibility that the test never rejects the null.

2.1 Anytime validity and test supermartingales

In contrast to traditional hypothesis testing with a fixed and known total sample size, the total sample size that will be produced by the before stopping is not known in advance. As a consequence, repeatedly evaluating a test designed for fixed finite sample sizes will generate an inflated Type-I error (or size) of the test, see for example albers2019problem; o1971present. Although techniques such as multiple comparison p-value adjustments, see for example hsu1996multiple, exist to correctly modify tests a-posteriori, these methods have quickly decaying power as the number of repeated tests grows large and require that the number of tests to be performed be known in advance. In order to avoid these issues, we work with the concept of an anytime valid test which allows arbitrary testing policies that need not necessarily be specified in advance.

Definition 2.1 (anytime validity).

Let a null hypothesis be given. We say that a test is anytime valid, or has Type-I error control, at a level if for all -stopping-times we have

A classical method for constructing anytime valid tests is based on nonnegative supermartingales. The following definition goes back to MR2849911.

Definition 2.2 (test supermartingale).

Let a null hypothesis be given. A test supermartingale (for ) is a nonnegative adapted process with initial value that is a -supermartingale for all .

We recall that a random process adapted to a filtration such that is a -supermartingale (-submartingale) if for all we have (). A process that is both a -supermartinale and a -submartingale is called a -martingale, and satisfies the equality for all . Here, means that the expectation is taken with respect to the probability measure .

Test supermartingales can be used to construct anytime valid tests. The basic tool for showing anytime validity is Ville’s inequality (MR3533075), which states that any nonnegative -supermartingale with satisfies for all . Thus, if is a test supermartingale and is fixed, then the test which rejects the null as soon as reaches a value above ,



and is therefore anytime valid at level .

Remark 2.3.

A notion closely related to test supermartingales is that of an e-process, which is a nonnegative adapted process such that for all and all stopping times (10.1214/20-AOS2020; xu_wang_ramdas_21; ram_ruf_lar_koo_20; MR4364897). The stopping theorem implies that every test supermartingale is an e-process, but the converse is not true (ram_ruf_lar_koo_20; MR4364897). The ‘static’ or non-sequential analog of an e-process is known as an e-variable, which is a nonnegative random variable such that for all . These notions have recently been studied extensively as a tool for safe inference (gru_hei_koo_19).

2.2 Power and growth

In addition to validity we are also interested in power against suitable alternative hypotheses disjoint from . Loosely speaking this means that if the true data distribution belongs to , the test should reject quickly with high probability. For tests arising from test supermartingales via (1) this corresponds to designing to grow quickly with high probability under distributions in . This can be approached using the Growth Rate Optimal (GRO) criterion, which has recently received significant attention in the context of e-values and e-process (gru_hei_koo_19). In our setting the GRO criterion is as follows. At each time one seeks to maximize the expected logarithmic increment conditionally on data observed so far across all test supermartingale increments. More formally one aims to solve


given the observed data , where is a suitable distribution. As we will see, need not itself be the only element of , or even belong to at all. It is a purely computational device used to guide the choice of .

In implementing this idea one is faced with three key issues:

  1. The problem (2) optimizes over the set of all test supermartingales. Solving it requires a description of this set, or of a sufficiently rich subset.

  2. A suitable distribution has to be specified.

  3. One has to actually solve (2), at least numerically, and ideally derive performance guarantees with respect to the set of alternatives.

In this paper, we consider null hypotheses based on elicitable functionals and identifiable functionals, which admit large families of explicit test supermartingales. This addresses (i). In order to address (ii) we focus on distributions which are not fixed in advance but rather learned in an online fashion as more and more data is observed. This idea has recently also been explored by wau_ram_20. Having dealt with (i) and (ii), the GRO criterion becomes a concrete optimization problem which we solve using methods from Online Convex Optimization (OCO). A key feature of this approach is that OCO methods come with asymptotic performance guarantees in the form of regret bounds. We employ these bounds to show that the resulting tests have asymptotic power one under a large composite nonparametric alternative hypothesis , in the sense that we obtain a test supermartingale which tends to infinity with probability one under every distribution in . Consequently, if the true data distribution is some element of then any test of the form (1) is guaranteed to eventually reject the null: for all . This addresses (iii).

2.3 Test supermartingales via mixing

The null hypotheses considered in this paper will be constructed directly in terms of explicit families of test supermartingales indexed by a parameter , where is an (arbitrary) index set. Whenever such a family is available, it is possible to construct new test supermartingales by combining its members. For instance, it is clear that any convex combination of test supermartingales is again a test supermartingale. More generally, one can use predictably mixed test supermartingales as shown in the following lemma; see also wau_ram_20. In this way one can assemble weak test supermartingales into more powerful ones.

Lemma 2.4 (Predictably mixed supermartingale).

Let be a family of test supermartingales and a predictable sequence of probability measures on . Then the process defined by and


is also a test supermartingale.

To be precise, we assume here that is a measurable space, and that is measurable for each . The condition on means that for each , is a probability measure on that may depend on the preceding data points. The proof of Lemma 2.4 can be found in the appendix.

Example 1 (i.i.d. Gaussians).

Consider the hypothesis . Then for every the process defined by and for by is a test (super)martingale. The predictably mixed process in (3) then takes the form

Remark 2.5.

The mixing construction (3) admits a useful interpretation in terms of trading a portfolio of financial assets. Treating the collection of test supermartingales as a collection of tradable assets indexed by , we may think of in (3) as the value of a portfolio trading these assets. Indeed, regard as the portfolio weights specifying the proportion of capital allocated to each asset at time ; this is observable at time because is a predictable sequence. The portfolio return from time to is the weighted average of the individual asset returns,

Rearranging (3) one sees that this is equal to the overall portfolio return . We can think of selecting a strong allocation strategy as choosing bets against in order to make grow quickly, eventually exceeding the threshold to reject . Indeed, if is false, there may exist assets which are not supermartingales, enabling one to ‘bet against the null’ by selecting with weights on these processes such that the wealth process grows on average. In contrast, if is true, Ville’s inequality shows that, regardless of the trading strategy employed, it is unlikely (with probability bounded by ) that our wealth ever exceeds the threshold .

In practice it is not feasible to work with general predictable sequences . Instead we consider parsimonious specifications that tend to work well in experiments. A key example is the Dirac specification, , where is a -valued predictable process. This is the simplest possible specification. In terms of the trading interpretation in Remark 2.5, a strategy of this kind chooses in each period one single asset where all capital is invested. The test supermartingale (3) simplifies to


3 Specifying the null hypothesis

We consider null hypotheses involving the value of certain statistical functionals of the (conditional) distributions of the data. For example, for a given value we may want to to test the hypothesis

The hypotheses considered below generalize this example beyond medians to a large class of elicitable functionals and identifiable functionals. These concepts are reviewed below; they include quantiles, moments, expectiles, and many other examples. The key common feature of these hypotheses is that they can be expressed in the form


for some explicit family of nonnegative processes starting at , indexed by a parameter where is an (arbitrary) index set. In our applications will be a subset of a finite-dimensional space. Thus by construction, constitutes a family of ‘base’ test supermartingales for which can be used to form other test supermartingales through predictable mixing as explained in Subsection 2.3.

3.1 Definition of elicitability and identifiability

We review the definitions as given in FisslerZiegel2016. Fix and a subset . A scoring function is simply a measurable map . Let be a class of probability distributions on . If for each distribution the map


is well-defined and finite, let denote the set of its minimizers. The induced map is called an elicitable functional (with respect to ) and a strictly consistent scoring function for . Here denotes the canonical random variable on . If consists only of one element, that is the minimizer in (6) is unique, we abuse notation and also use the notation for the minimizer. Any given elicitable functional can have many different strictly consistent scoring functions.

Similarly, an identification function is a measurable map . If for each distribution the map


is well-defined and finite, let denote the set of its zeros. Then is called an identifiable functional (with respect to ) and a strict identification function for . A zero of the expected identification function in (7) is understood to hold component-wise, since . These definitions of elicitability and identifiability are called higher-order elicitability by FisslerZiegel2016.

When applying an elicitable or identifiable functional to a distribution in the following, we always implicitly assume that the functional is well-defined for this distribution.

Table 1 contains some examples of commonly used functionals that happen to be both elicitable and identifiable. The presented scoring functions are standard but not strictly consistent on the maximal possible domain of definition of the respective functionals. Different choices of strictly consistent scoring functions allow to show elicitability of these functionals on their natural domains of definition; see gneiting2011making for details.

In the presence of convexity, the connection between scoring and identification functions can be understood through subgradients. For instance, if is an elicitable functional whose scoring function is convex in , then is also an identifiable functional with identification function , an element of the subgradient of the scoring function with respect to . For the converse, linking an identification function to a unique convex scoring function , more subtle conditions are needed, and we point interested readers to rockafellar2009variational. In the absence of convexity, scoring and identification functions are still linked through gradients under sufficient differentiability assumptions which are formalized as Osband’s principle in FisslerZiegel2016.

Table 1: Examples of statistical quantities that can be expressed as elicitable and identifiable functionals. In the elicitable case, for the mean, is the class of all distributions with finite second moment; for the quantiles, it is the class of all distributions with finite first moment; for regression, , and contains all distributions on with finite expected squared norm. In the identifiable case, for the mean, is the class of all distributions with finite first moment; for the quantiles, it is the class of all distributions with continuous distribution function at the -quantile; for regression, , and contains all distributions on with finite mean.

We now use the concepts of elicitability and identifiability to construct null hypotheses for the sequential testing problem. Let be either an elicitable functional or an identifiable functional with scoring function (in the elicitable case) or identification function (in the identifiable case). Given a fixed value we consider the null hypothesis


Thus the hypothesis is that returns a set containing whenever it is applied to the conditional distribution of an observation given all earlier observations. For instance, if is the median functional, we recover the example at the beginning of this section.

Using the definition of elicitability or identifiability, we obtain a simpler representation of in terms of supermartingales or martingales and the scoring or identification function. Specifically, we have


in the elicitable case, and


in the identifiable case. Note that in the identifiable case, is a vector valued martingale. For convenience, if is of the form (9) we call it an elicitable hypothesis, and if it is of the form (10) we call it an identifiable hypothesis.

Let us spell out why, in the elicitable case, (8) essentially coincides with (9); the identifiable case is similar. Due to the definition of elicitability, is equivalent to having for all . This holds for all if and only if the process is a -supermartingale. Thus the right-hand sides of (8) and (9) are essentially the same. There is one subtlety that we have neglected in this argument which is usually irrelevant in applications. Since strictly consistent scoring functions are not unique, it may happen that the elicitable functional is defined for a larger class of distributions than the one where the chosen strictly consistent scoring function in (9) has finite expectation. This means that the moment conditions on the conditional distributions in (9) may be slightly stronger than in (8). However, for many examples including the ones in Table 1, this problem does not arise since we work with score differences.

The scoring function which elicits a functional

is usually not unique. For example, the class of consistent scoring functions for the mean consists of all Bregman loss functions

(Savage1971). Although there are no general guidelines for how one should select a scoring function, it is often natural to give preference to scoring functions that satisfy certain additional desirable properties, a relevant example in our setting being convexity in the first argument. For the mean, ratios of expectations and quantiles, convex strictly consistent scoring functions are essentially unique, see Fissler2017, caponnetto2005note and steinwart2014elicitation

. In the context of estimation in semi-parametric models for a quantile or the mean,

komunjer2010semiparametric; komunjer2010efficient; dimitriadis2020efficiency show that there exist unique choices of scoring functions which maximize the asymptotic efficiency of the estimators, but these are different from the convex choices described above.

The form of in (9) and (10) are suggestive of how one could construct families of base test supermartingales. If or is uniformly bounded, this is straightforward, see Subsection 3.2. The unbounded case requires to include additional moment bounds, which we achieve by imposing a sub- condition, see Subsection 3.3.

3.2 Uniformly bounded scoring and identification functions

We construct parametric families of test martingales when the score difference, , or the norm of the identification function, , are uniformly bounded. The proofs of the following two lemmas can be found in the appendix.

Lemma 3.1 (Test martingales for elicitable hypotheses).

Consider an elicitable hypothesis of the form (9), and assume that . For each , define the processes by

Then the collection of processes forms a family of test supermartingales.

The lower bound of appearing in the assumption that is without loss of generality since scoring functions may be rescaled by positive constants leaving the elicitable functional itself unchanged. In a similar fashion, we may construct test martingales for identifiable hypotheses as follows.

Lemma 3.2 (Test martingales for identifiable hypotheses).

Consider an identifiable hypothesis of the form (10), assume that and define . For each , define the process by

Then is convex with non-empty interior, and the set forms a family of test martingales.

Remark 3.3.

Whenever the scoring function is convex in and satisfies the conditions of Lemma 3.1, there is a direct connection between the two martingale constructions presented above. Indeed, if we let , then


for all and . Hence, each increment of the test martingale construction of Lemma 3.2 can be thought of as a linearization of the increments of the processes defined in Lemma 3.1. Moreover, equation (11) implies that for any elicitable hypotheses with convex scoring function, identifiable test martingales generated by Lemma 3.2 will dominate the elicitable test martingale Lemma 3.1 whenever . Indeed, it is easy to verify that

and that the right-hand side produces a valid test supermartingale for all . This observation suggests that whenever an elicitable functional admits a bounded and convex scoring function, the test generated by its subgradient using Lemma 3.2 will always be more powerful that the one generated by Lemma 3.1.

Fissler2017 shows that, under suitable conditions, identification functions are unique up to multiplication with a matrix valued function in . Therefore, the Remark 3.3 does not only apply to a subgradient of a convex scoring function but to any identification function, as long as a convex scoring function for the respective functional exists, and with a suitable modification of the relation .

In the setting of analyzing the asymptotic efficiency of semi-parametric estimators of elicitable and identifiable functionals, a similar relation is observed, in which an estimator generated by the identification function will always be asymptotically more efficient that its elicitable counterpart (dimitriadis2020efficiency).

The uniform boundedness assumptions in Lemmas 3.1 and 3.2 may appear to be restrictive. However, they cover a number of cases of interest including the mean whenever the data generating process is bounded, see also wau_ram_20. A second relevant example are (vectors of) quantiles, where the uniform boundedness assumption for the identification function is met, regardless of whether the data generating process is bounded or not. Indeed, it is easy to see from the rightmost column of Table 1 that is uniformly bounded. Hence the family of test martingales of Lemma 3.2 is always valid in the case of testing quantiles.

3.3 Test supermartingales for sub- hypotheses

In the more general case that the scoring function or identification function is unbounded, we construct families of test martingales under the assumption of a tail bound on the scoring or identification function which involves bounding the cumulant generating function. We introduce the definition of a sub- process below, a notion related to those introduced in 10.1214/aop/1176996452; 10.1214/009117904000000397 but most closely related to (howard2018timeuniform, Definition 1).

Definition 3.4 (Sub- Process).

Let and be

-adapted processes, where the variance process

is assumed to be non-negative and . We say that is sub- if there is a and a nonnegative convex function satisfying satisfying , where is its right derivative, and for each ,

is a supermartingale, where .

Definition 3.4 is similar to howard2018timeuniform, but there are some noteworthy differences. In particular, (howard2018timeuniform, Definition 1) is a weaker condition in that it allows to only be upper-bounded by a supermartingale, rather than be a supermartingale itself. We make the choice of requiring the supermartingale condition in order to be able work with the supermartingale predictable mixing introduced in Section 2.3, which would break down without this assumption. Further discussion of the sub- condition and its applications in time-uniform confidence bounds can be found in howard2018timeuniform; howard2020timeuniform. In particular, we point the reader to howard2018timeuniform for a collection of commonly used functions and variance processes which are valid under a wide variety of assumptions.

Typically, is the simplest possible choice of variance process, and we use it for all concrete examples in this paper. We have chosen to state our theoretical results for the more general Definition 3.4 in order to be consistent with existing literature. When , the sub- condition specializes to (conditional, one-sided versions of) sub-Gaussian, sub-Gamma, sub-Exponential, sub-Bernoulli and related conditions on the increments of , obtained by choosing to be the corresponding cumulant generating function. Specifically, the condition in Definition 3.4 is then equivalent to

This condition implies a bound on the right tail probabilities of the increments of a sub- process . Indeed, Chernoff’s inequality (see e.g. HAGERUP1990305) states that

where is the convex conjugate of .

The following lemma shows that, for a given sub- process , the supermartingale property on is equivalent to the existence of a non-negative supermartingale.

Lemma 3.5.

Suppose that is an -adapted sub- process. Then is a supermartingale if and only if is a supermartingale.


Let be a supermartingale. By the sub- property, and denoting the forward increments of processes as , we have that

where the last inequality follows since the supermartingale property implies that .

Now, assume that is a supermartingale for all . Since we have

where in the third line we use the fact that since is convex and differentiable at zero, there exists a closed neighborhood including zero in which is continuous and hence has an integrable upper bound, allowing us to exchange the limit and the expectation by the dominated convergence theorem. Lastly, noting that by the assumed supermartingale property, we conclude that , demonstrating that is a supermartingale, as desired. ∎

We say that a family of integrable processes indexed by is sub- if for each there is a function and a process such that for each , is sub-. We note that although varies with , the interval is assumed to be the same for all . Using Lemma 3.5, we construct families of test supermartingales under the assumption that the scoring or identification functions satisfy a sub- condition, allowing us to extend the anytime-valid sequential testing methodology to unbounded data. The proofs of the following two Lemmas can be found in the appendix.

Lemma 3.6 (Test supermartingales for sub- elicitable hypotheses).

Let be an elicitable functional with scoring function and let be an elicitable hypothesis of the form (9). For every define and for some nonnegative -adapted process . If the family is sub- under every measure in , then for each and , the process defined by

is an test supermartingale.

Proceeding almost identically to Lemma 3.6, we may construct test supermartingale families for identifiable hypotheses.

Lemma 3.7 (Test martingales for sub- identifiable hypotheses).

Let be an identifiable functional with identification function , and let be an identifiable hypothesis of the form (10). Let , and define for each the processes and for some nonnegative -adapted process . If the family of processes is sub- under every measure in , then for each and the process defined by

is an test supermartingale.

The following examples illustrate two situations where test supermartingales can be constructed for unbounded data using sub- assumptions.

Example 2 (Sub- mean).

Recall from Table 1 that the mean is identifiable with identification function . Now suppose that a real-valued data-generating process has conditionally sub-Gaussian increments so that for and some . Under this assumption, we have that is sub- with and , allowing us to apply Lemma 3.7.

Example 3 (Sub- regression).

Consider the hypothesis that the data follows an linear time series model, , where , is a martingale difference sequence where is sub- with variance process and is unknown. We wish to test whether . In each time step, is the value of the identifiable functional . In view of the regression example of Table 1, this functional has identification function

where and where the re-scaling by is possible because is -measurable. In order to apply the testing methodology of Lemma 3.7, we show that the processes with are sub- with for all with . Computing

where , we see that if are sub- with , then will also be sub- since , which follows due to the fact that . Hence, a sufficient condition for this identifiable functional to satisfy the necessary sub- condition is simply that the signed residual processes are sub-.

There is a relationship between tests for identifiable and elicitable hypotheses with a convex scoring function in the sub- case, analogous to the one pointed out in Remark 3.3. Indeed, let be an elicitable functional with a convex scoring function . As pointed out in Remark 3.3, is also identifiable with identification function . If we assume in addition that for a fixed , the family of processes with increments is sub- with , then for any and , the process

is a valid test supermartingale according to Lemma 3.7. However, since is convex and the are increments of the sub- process , we have that under ,

Hence, the processes

are valid test supermartingales which match the form of the test supermartingales presented in Lemma 3.6. Due to (3.3), however, we note that . Hence, whenever is convex and the families of processes and are both increments of a sub- process, we find that the tests generated by the identification function according to Lemma 3.7 will always be more powerful than a test generated by the scoring function according to Lemma 3.6, yielding a conclusion analogous to that in Remark 3.3.

Remark 3.8 (Bridging the sub- and bounded test supermartingales).

Although the sub- and uniformly bounded hypothesis testing methodologies may appear disjoint, there are in fact some connections which are worth highlighting.

The first and arguably most important remark is that all processes with bounded increments are sub-Gaussian, and hence sub-. Indeed, whenever a process satisfies , it follows by Hoeffding’s lemma (see e.g. hoeffding1994probability; hertz2020improved) that


where the second and third expression in the above inequality are the cumulant generating functions of a Gaussian random variable. Hence, we have that processes with bounded increments are sub-, where is given by either expressions in (12).

There exists a deeper connection between the two as follows. Let be a supermartingale difference process and let . Define for each and the family of processes , where the collection of compensators with satisfy for all and . Since the function