1 Introduction
Conformal prediction is based on the notion of a pvalue. At this time pvalues are widely discussed (see, e.g., [1]
), and several alternatives to pvalues have been proposed. Perhaps the most popular alternative is Bayes factors, which, when stripped of their Bayesian context, are referred to as evalues in
[10]. In fact, evalues were used (under the name of ivalues) when discussing precursors of conformal prediction in the 1990s. One early description is [3]. In this note, it will be convenient to distinguish between conformal eprediction (using evalues) and conformal pprediction (standard conformal prediction using pvalues).Conformal eprediction was superseded by conformal (p)prediction mainly for two reasons:

Conformal predictions can be packaged as prediction sets [9, Section 2.2]
, in which case the property of validity of conformal predictors is very easy to state: we just say that the probability of error is at most
for a prespecified significance level [9, Proposition 2.3]. 
In the online mode of prediction, smoothed conformal predictors make errors independently, and so small probabilities of errors manifest themselves, with high probability, as a low frequency of errors [9, Corollary 2.5].
These are important advantages of conformal pprediction, which do not appear to be counterbalanced by any clear disadvantages.
Whereas the notion of a conformal epredictor does not appear particularly useful, using evalues in place of pvalues in crossconformal prediction [8] has a clear advantage. Crossconformal predictors are not provably valid [8, Appendix], and this sometimes even shows in experimental results [6]. The limits of violations of validity are given by Rüschendorf’s result (see, e.g., [11]): when merging pvalues coming from different folds by taking arithmetic mean (this is essentially what crossconformal predictors do), the resulting arithmetic mean has to be multiplied by 2 in order to guarantee validity. In the recent method of jackknife+, introduced in [2] and closely related to crossconformal prediction, there is a similar factor of 2 [2, Theorem 1], which cannot be removed in general [2, Theorem 2].
The situation with evalues is different: the arithmetic mean of evalues is always an evalue. This is an obvious fact, but it is shown in [10] that arithmetic mean is the only useful merging rule. Therefore, the version of crossconformal prediction based on evalues, which we call crossconformal eprediction in this note, is always valid.
2 Conformal epredictors
Suppose we are given a training set consisting of labelled objects and our goal is to predict the label of a new object . In this note we consider predictors of the following type: for each potential label for we would like to have a number reflecting the plausibility of being the true label of . An example is conformal transducers [9, Section 2.5], which, in the terminology of this note, may be called conformal ppredictors. The output
of a conformal ppredictor is the full conformal prediction for the label of ; e.g., it determines the prediction set at each significance level. We will sometimes write , where , instead of .
We will use the notation for the object space and for the label space (both assumed nonempty). These are measurable spaces from which the objects and labels, respectively, are drawn. Full observations are drawn from . For any nonempty set , will be the set of all nonempty finite sequences of elements of .
A conformal epredictor is a function that maps any finite sequence , , to a finite sequence of nonnegative numbers with average at most 1,
(1) 
that satisfies the following property of equivariance: for any , any permutation of , any , and any ,
The conformal epredictor proposed in [4] is
where SV
is the set of indices of support vectors:
if and only if is a support vector for the SVM constructed from as training set. When given a training set and a new object , this conformal epredictor goes through all potential labels for and for each constructs an SVM and outputs . It makes it computationally inefficient.The following obvious proposition asserts the validity of conformal epredictors.
Proposition 1.
For any , if are IID,
3 Split conformal epredictors
Let us fix a measurable space (a summary space). A valued split conformity measure is a measurable function . Intuitively, encodes how well conforms to . A normalizing transformation is an equivariant measurable function that maps every nonempty finite sequence of elements of to a finite sequence of the same length of nonnegative numbers whose average is at most 1 (i.e., satisfying (1)).
To apply split conformal eprediction to a training set , we split it into two parts, the training set proper and the calibration set . For a new object and a potential label for it, we set
(2) 
where is defined using the following steps:
For many choices of and , the split conformal epredictor (2) will be computationally efficient; this is the case when:

Processing the training set proper only once, we can find an easily computable rule transforming into .

The normalizing transformation is easily computable.
An example of an easily computable normalizing transformation is
where the summary space is supposed to be .
Proposition 1, our statement of validity, continues to hold for split conformal epredictors.
4 Crossconformal epredictors
A valued split conformity measure is a valued crossconformity measure if does not depend on the order of its first arguments. Given such an and a normalizing transformation , the corresponding crossconformal epredictor (CCEP) is defined as follows. The training sequence is randomly split into nonempty multisets (folds) , , of equal (or as equal as possible) sizes , where is a parameter of the algorithm, is a partition of the index set , and consists of all , . For each and each potential label of the new object , find the output of the split conformal epredictor on the new object and its postulated label with as training set proper and as calibration set, where is the complement to the fold . The corresponding CCEP is defined by
(A slight modification, still provably valid, of this definition is where the arithmetic mean is replaced by the weighted mean with the weights proportional to the sizes of the folds.)
Proposition 1 still holds for crossconformal epredictors; this trivially follows from the arithmetic mean of evalues being an evalue.
Remark 2.
To compare the outputs of crossconformal ppredictors (CCPP) and CCEP, we can use the rough transformation discussed in [10]: a pvalue of roughly corresponds to an evalue of . Under this transformation, the arithmetic average of evalues corresponds to the harmonic average of pvalues, and the harmonic average is always less than or equal to the arithmetic average [5, Theorem 16]. This suggests that CCEP produce better results than CCPP do. In the opposite direction, the arithmetic average of pvalues corresponds to the harmonic average of evalues, which again suggests that CCEP produce a better result than CCPP do.
5 Validity in the time domain
Advantage 1 of conformal pprediction given on p. 1 is that, when discussing validity, we can talk about probabilities instead of pvalues. It mostly disappears when we move to conformal eprediction: validity has to be defined in terms of evalues (however, it has been argued [7] that evalues are more intuitive than pvalues). In this section we discuss advantage 2, which requires the online mode of prediction. We will see that it still holds, albeit in a weakened form.
The notion of validity asserted in Proposition 1 (applied to CCEP) is stated in the “space domain”: the output of the CCEP on the true label is an evalue, i.e., its mean value at a fixed time over the probability space does not exceed 1. Now we will complement the validity in the space domain by the validity in the time domain assuming that the CCEP is bounded.
In the online prediction protocol, we observe an object , apply the CCEP to compute the values for all possible labels , observe the true label , observe another object , apply the CCEP to compute the values for all possible labels , observe the true label , etc.
Remark 3.
Let the values for the true labels be . It follows from, e.g., [9, Lemma 3.15] that, if the observations are IID,
We can see that the longterm time average of the values for the true labels is bounded above by 1. In this sense they are timewise evalues.
Acknowledgments
This research was partially supported by Astra Zeneca and Stena Line.
References
 [1] Special issue on pvalues. American Statistician, 73(Supplement 1), 2019.
 [2] Rina Foygel Barber, Emmanuel J. Candès, Aaditya Ramdas, and Ryan J. Tibshirani. Predictive inference with the jackknife+. Technical Report arXiv:1905.02928 [stat.ME], arXiv.org ePrint archive, December 2019.

[3]
Alex Gammerman, Vladimir Vapnik, and Vladimir Vovk.
Transduction in pattern recognition, 1997.
Manuscript submitted to the Fifteenth International Joint Conference on Artificial Intelligence in January 1997. Extended version published as
[4].  [4] Alex Gammerman, Vladimir Vovk, and Vladimir Vapnik. Learning by transduction. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pages 148–155, San Francisco, CA, 1998. Morgan Kaufmann.
 [5] G. H. Hardy, John E. Littlewood, and George Pólya. Inequalities. Cambridge University Press, Cambridge, second edition, 1952.

[6]
Henrik Linusson, Ulf Norinder, Henrik Boström, Ulf Johansson, and Tuve
Löfström.
On the calibration of aggregated conformal predictors.
Proceedings of Machine Learning Research
, 60:154–173, 2017.  [7] Glenn Shafer. The language of betting as a strategy for statistical and scientific communication. The GameTheoretic Probability and Finance project, http://probabilityandfinance.com, Working Paper 54, October 2019 (first posted March 2019).
 [8] Vladimir Vovk. Crossconformal predictors. Annals of Mathematics and Artificial Intelligence, 74:9–28, 2015.
 [9] Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, 2005.
 [10] Vladimir Vovk and Ruodu Wang. Combining evalues and pvalues. Technical Report arXiv:1912.06116 [math.ST], arXiv.org ePrint archive, December 2019.
 [11] Vladimir Vovk and Ruodu Wang. Combining pvalues via averaging. Technical Report arXiv:1212.4966 [math.ST], arXiv.org ePrint archive, October 2019. Journal version: Biometrika (to appear).
Comments
There are no comments yet.