1 Introduction
This note treats a problem similar to the one considered in [2]: we would like to test online the null hypothesis of exchangeability of binary observations under Markov alternatives.
The simplest way of online hypothesis testing is to use test martingales, which are defined as nonnegative processes with initial value 1 that are martingales under the null hypothesis; see, e.g., [3]. Such processes, for the null hypothesis of exchangeability, can be constructed using the method of conformal prediction [6], and we will refer to them as conformal test martingales. A previous paper [4] constructs custommade conformal test martingales for different alternative hypotheses, those of a changepoint.
The method of [2], which is specifically devoted to Markov alternatives, is more general: instead of a test martingale the authors construct a “safe eprocess” (to be defined in the next section). Safe eprocesses are closely related to test martingales and admit a similar interpretation as the capital of a gambler trying to discredit the null hypothesis. Our methods give similar results to the methods of [2] in the model situations that we consider (following [2]
). The advantage of our methods is that they extend easily to the usual setting of machine learning, where the observations are pairs
consisting of a potentially complex object and its label .In this note we only design conformal test martingales for a simple alternative hypothesis (a specific probability measure). This is different from
[2], who are interested in testing against the composite alternative Markov hypothesis. As in [2], we could mix our conformal test martingales over the possible alternative hypotheses, but we leave this step for future research.2 Model situations
This section introduces the model situations considered in this paper, following [2, Section 4.2]
. Our data consist of binary observations generated from a Markov model. We will use the notation
for the probability distribution of a Markov chain with the transition probabilities
for transitions and for transitions ; the probability that the first observation is 1 will always be assumed . In the hard case, the model is , and in the easy case, the model is . The number of observations is (as in [2]) or or ; we will refer to these scenarios as large, medium, and small, respectively.In all our experiments we use 2021 as the seed for the NumPy pseudorandom number generator. (This, however, does not make the trajectories in our plots comparable between different scenarios.) The dependence on the seed will be explored in boxplots reported in Section 5; the seed affects not only the data but also the values of conformal martingales, which are randomized processes, given the data.
Let
be the Bernoulli distribution on
with parameter : . Set . Our null hypothesis is the IID model, under which the observations are generated from with unknown parameter .Ramdas et al. construct a safe eprocess : namely, under any , is dominated by a test martingale w.r. to , in the sense that for all and . The trajectories of their process for the two cases, hard and easy, are shown in Figure 1 (they coincides with those in Figure 4 in [2] apart from using base 10 logarithms and a different randomly generated dataset). The figure also shows trajectories of the Simple Jumper martingale (see, e.g., [5]) for various values of the jumping rate; it performs poorly in this context.
3 Two benchmarks
In this section we will discuss possible benchmarks that we can use for evaluating the quality of our conformal test martingales. The upper benchmark is
where is the set of all infinite sequences of binary observations starting from , and are the actual observations. The lower benchmark is
where
(the maximum likelihood estimate) and
is the number of 1s among . By definition, .The trajectories of the upper and lower benchmarks are shown in Figure 2 in red and green; the figure also shows the trajectory the process discussed in the previous section, and the other two trajectories should be ignored for now. The two benchmarks coincide or almost coincide. Figure 3 should the same trajectories “under the lens”, over the last 1000 observations.
4 Bayesian conformal testing
In this section we will use a Bayesian method that is statistically efficient in our experiments but whose computational efficiency will be greatly improved in the next section. The pvalues are generated as described in [4]; in particular, we are using the identity nonconformity measure (the nonconformity score of an observation is ). Under the alternative hypothesis, the pvalues are generated by a completely specified stochastic mechanism. According to [1, Theorem 2], the optimal (in the Kellytype sense of that paper) betting functions are given by the density of the predictive distribution of conditional on knowing . Let us find these predictive distributions. We will use the notation , where , for the uniform probability distribution on the interval (so that its density is ).
We are in a typical situation of Bayesian statistics. The Bayesian parameter is the binary sequence
of observations, and the prior distribution on the parameter is . The Bayesian observations are the conformal pvalues . Given the parameter, the distribution of iswhere is the number of 1s among the first observations.
Let , where , , and
, be the total posterior probability of the parameter values
for which and ; we will use them as the weights when computing the predictive distributions for the pvalues. We can compute the weights recursively in as follows. We start fromAt each step , first we compute the unnormalized weights
where is the likelihood defined by
and then we normalize them:
Given the posterior weights for the previous step, we can find the predictive distribution for as
where we use the shorthand . Therefore, the betting functions for the resulting Bayes–Kelly conformal test martingale are
(1) 
For experimental results, see Figure 4, in addition to Figure 2. The Bayes–Kelly conformal test martingale appears to be very close to the two benchmarks. Its simplified version is described in the next section. The relatively poor performance of the process in the left panel of Figure 4 should not be interpreted as it being inferior to the Bayes–Kelly conformal test martingale: remember that works against all Markov alternatives, whereas the other processes in Figures 2–8 are adapted to the specific alternative hypothesis ( in the hard case and in the easy case).
5 Simplified Bayesian conformal testing
In this section we consider a radical simplification of the Bayes–Kelly conformal test martingale (1). We still assume that the Markov chain is symmetric, as in our model situations. If we assume that the weights , , are concentrated at
(1) will simplify to
(2) 
Figure 5 shows the weights (averaged over ) for the last step of the Bayes–Kelly conformal test martingale in the medium scenario ( observations). They are indeed concentrated around values of not so different from .
As a second step, we make (2) straightforward to compute by setting
(If , then with high probability.) The performance of the simplified version is shown in Figures 2–4 and 6. It is usually worse than that of the Bayes–Kelly conformal test martingale and the two benchmarks, but is comparable on the log scale apart from the right panel of Figure 6.
References
 [1] Valentina Fedorova, Ilia Nouretdinov, Alex Gammerman, and Vladimir Vovk. Plugin martingales for testing exchangeability online. In John Langford and Joelle Pineau, editors, Proceedings of the Twenty Ninth International Conference on Machine Learning, pages 1639–1646. Omnipress, 2012.
 [2] Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter Koolen. How can one test if a binary sequence is exchangeable? Forkconvex hulls, supermartingales, and Snell envelopes. Technical Report arXiv:2102.00630 [math.ST], arXiv.org ePrint archive, July 2021 (version 4).

[3]
Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk.
Test martingales, Bayes factors, and pvalues.
Statistical Science, 26:84–101, 2011.  [4] Vladimir Vovk. Conformal testing in a binary model situation. Proceedings of Machine Learning Research, 152:131–150, 2021. COPA 2021.
 [5] Vladimir Vovk. Testing randomness online. Statistical Science, 36:595–611, 2021.
 [6] Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, 2005.
Comments
There are no comments yet.