Conformal testing: binary case with Markov alternatives

by   Vladimir Vovk, et al.

We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency.



There are no comments yet.


page 1

page 2

page 3

page 4


Testing the Number of Regimes in Markov Regime Switching Models

Markov regime switching models have been used in numerous empirical stud...

Conformal testing in a binary model situation

Conformal testing is a way of testing the IID assumption based on confor...

Testing for Homogeneity with Kernel Fisher Discriminant Analysis

We propose to investigate test statistics for testing homogeneity in rep...

Optimal tests for elliptical symmetry: specified and unspecified location

Although the assumption of elliptical symmetry is quite common in multiv...

Testing for Principal Component Directions under Weak Identifiability

We consider the problem of testing, on the basis of a p-variate Gaussian...

Intermediate efficiency of tests under heavy-tailed alternatives

We show that for local alternatives which are not square integrable the ...

How can one test if a binary sequence is exchangeable? Fork-convex hulls, supermartingales, and Snell envelopes

Suppose we observe an infinite series of coin flips X_1,X_2,…, and wish ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This note treats a problem similar to the one considered in [2]: we would like to test online the null hypothesis of exchangeability of binary observations under Markov alternatives.

The simplest way of online hypothesis testing is to use test martingales, which are defined as nonnegative processes with initial value 1 that are martingales under the null hypothesis; see, e.g., [3]. Such processes, for the null hypothesis of exchangeability, can be constructed using the method of conformal prediction [6], and we will refer to them as conformal test martingales. A previous paper [4] constructs custom-made conformal test martingales for different alternative hypotheses, those of a changepoint.

The method of [2], which is specifically devoted to Markov alternatives, is more general: instead of a test martingale the authors construct a “safe e-process” (to be defined in the next section). Safe e-processes are closely related to test martingales and admit a similar interpretation as the capital of a gambler trying to discredit the null hypothesis. Our methods give similar results to the methods of [2] in the model situations that we consider (following [2]

). The advantage of our methods is that they extend easily to the usual setting of machine learning, where the observations are pairs

consisting of a potentially complex object and its label .

In this note we only design conformal test martingales for a simple alternative hypothesis (a specific probability measure). This is different from

[2], who are interested in testing against the composite alternative Markov hypothesis. As in [2], we could mix our conformal test martingales over the possible alternative hypotheses, but we leave this step for future research.

2 Model situations

Figure 1: The process of [2] and the Simple Jumper in the large scenario. Left panel: the hard case. Right panel: the easy case.

This section introduces the model situations considered in this paper, following [2, Section 4.2]

. Our data consist of binary observations generated from a Markov model. We will use the notation

for the probability distribution of a Markov chain with the transition probabilities

for transitions and for transitions ; the probability that the first observation is 1 will always be assumed . In the hard case, the model is , and in the easy case, the model is . The number of observations is (as in [2]) or or ; we will refer to these scenarios as large, medium, and small, respectively.

In all our experiments we use 2021 as the seed for the NumPy pseudorandom number generator. (This, however, does not make the trajectories in our plots comparable between different scenarios.) The dependence on the seed will be explored in boxplots reported in Section 5; the seed affects not only the data but also the values of conformal martingales, which are randomized processes, given the data.


be the Bernoulli distribution on

with parameter : . Set . Our null hypothesis is the IID model, under which the observations are generated from with unknown parameter .

Ramdas et al. construct a safe e-process : namely, under any , is dominated by a test martingale w.r. to , in the sense that for all and . The trajectories of their process for the two cases, hard and easy, are shown in Figure 1 (they coincides with those in Figure 4 in [2] apart from using base 10 logarithms and a different randomly generated dataset). The figure also shows trajectories of the Simple Jumper martingale (see, e.g., [5]) for various values of the jumping rate; it performs poorly in this context.

3 Two benchmarks

In this section we will discuss possible benchmarks that we can use for evaluating the quality of our conformal test martingales. The upper benchmark is

where is the set of all infinite sequences of binary observations starting from , and are the actual observations. The lower benchmark is


(the maximum likelihood estimate) and

is the number of 1s among . By definition, .

Figure 2: The two benchmarks, process, Bayes–Kelly conformal test martingale, and its simplified version in the large scenario. Left panel: hard case. Right panel: easy case.
Figure 3: The analogue of Figure 2 for the last 1000 observations.

The trajectories of the upper and lower benchmarks are shown in Figure 2 in red and green; the figure also shows the trajectory the process discussed in the previous section, and the other two trajectories should be ignored for now. The two benchmarks coincide or almost coincide. Figure 3 should the same trajectories “under the lens”, over the last 1000 observations.

4 Bayesian conformal testing

In this section we will use a Bayesian method that is statistically efficient in our experiments but whose computational efficiency will be greatly improved in the next section. The p-values are generated as described in [4]; in particular, we are using the identity nonconformity measure (the nonconformity score of an observation is ). Under the alternative hypothesis, the p-values are generated by a completely specified stochastic mechanism. According to [1, Theorem 2], the optimal (in the Kelly-type sense of that paper) betting functions are given by the density of the predictive distribution of conditional on knowing . Let us find these predictive distributions. We will use the notation , where , for the uniform probability distribution on the interval (so that its density is ).

We are in a typical situation of Bayesian statistics. The Bayesian parameter is the binary sequence

of observations, and the prior distribution on the parameter is . The Bayesian observations are the conformal p-values . Given the parameter, the distribution of is

where is the number of 1s among the first observations.

Let , where , , and

, be the total posterior probability of the parameter values

for which and ; we will use them as the weights when computing the predictive distributions for the p-values. We can compute the weights recursively in as follows. We start from

At each step , first we compute the unnormalized weights

where is the likelihood defined by

and then we normalize them:

Given the posterior weights for the previous step, we can find the predictive distribution for as

where we use the shorthand . Therefore, the betting functions for the resulting Bayes–Kelly conformal test martingale are

Figure 4: The Bayes–Kelly and Bayes–Kelly simplified conformal test martingales, the -process, and the two benchmarks in the middle scenario. Left panel: hard case. Right panel: easy case.

For experimental results, see Figure 4, in addition to Figure 2. The Bayes–Kelly conformal test martingale appears to be very close to the two benchmarks. Its simplified version is described in the next section. The relatively poor performance of the -process in the left panel of Figure 4 should not be interpreted as it being inferior to the Bayes–Kelly conformal test martingale: remember that works against all Markov alternatives, whereas the other processes in Figures 28 are adapted to the specific alternative hypothesis ( in the hard case and in the easy case).

5 Simplified Bayesian conformal testing

Figure 5: The weights , , at the last step for the Bayes–Kelly conformal test martingale in the medium scenario (the hard case on the left and easy on the right).

In this section we consider a radical simplification of the Bayes–Kelly conformal test martingale (1). We still assume that the Markov chain is symmetric, as in our model situations. If we assume that the weights , , are concentrated at

(1) will simplify to


Figure 5 shows the weights (averaged over ) for the last step of the Bayes–Kelly conformal test martingale in the medium scenario ( observations). They are indeed concentrated around values of not so different from .

Figure 6: The analogue of Figures 2 and 4 for the small scenario.

As a second step, we make (2) straightforward to compute by setting

(If , then with high probability.) The performance of the simplified version is shown in Figures 24 and 6. It is usually worse than that of the Bayes–Kelly conformal test martingale and the two benchmarks, but is comparable on the log scale apart from the right panel of Figure 6.

Figure 7: Boxplots based on runs for the final values of the two benchmarks (upper and lower ), the Bayes–Kelly conformal test martingale (BK), and its simplified version (sBK) in the medium scenario. Left panel: hard case. Right panel: easy case.
Figure 8: The analogue of Figure 7 for runs in the small scenario.

The right panel of Figure 6 and Figures 7 and 8 show that the statistical performance of the simplified Bayes–Kelly martingale particularly suffers in the easy case. The notches in the boxplots in Figures 7 and 8

indicate confidence intervals for the median.


  • [1] Valentina Fedorova, Ilia Nouretdinov, Alex Gammerman, and Vladimir Vovk. Plug-in martingales for testing exchangeability on-line. In John Langford and Joelle Pineau, editors, Proceedings of the Twenty Ninth International Conference on Machine Learning, pages 1639–1646. Omnipress, 2012.
  • [2] Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter Koolen. How can one test if a binary sequence is exchangeable? Fork-convex hulls, supermartingales, and Snell envelopes. Technical Report arXiv:2102.00630 [math.ST], e-Print archive, July 2021 (version 4).
  • [3] Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk.

    Test martingales, Bayes factors, and p-values.

    Statistical Science, 26:84–101, 2011.
  • [4] Vladimir Vovk. Conformal testing in a binary model situation. Proceedings of Machine Learning Research, 152:131–150, 2021. COPA 2021.
  • [5] Vladimir Vovk. Testing randomness online. Statistical Science, 36:595–611, 2021.
  • [6] Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, 2005.