1 Introduction
Our recent paper [6] gives a generic procedure for turning emerging functions into discovery matrices and applies it to arithmetic mean. Using arithmetic mean is very natural in the case of arbitrary dependence between the base evalues, at least in the symmetric case, since arithmetic mean essentially dominates any emerging function [6, Theorem 5.1]. But in this note we will show that in the case of independent evalues we can greatly improve on arithmetic mean.
2 Discovery matrices for independent evalues
To make our exposition selfcontained, we start from basic definitions (see our previous papers [4, 5, 6] exploring evalues for further information).
An evariable
on a probability space
is a nonnegative extended random variable
such that . A measurable function for an integer is an iemerging function if, for any probability space and any independent evariables on it, the extended random variable is an evariable. We will only consider iemerging functions that are increasing in each argument and are symmetric (do not depend on the order of their arguments).Important examples of iemerging functions [5] are
(1) 
We will refer to them as the Ustatistics (they are the standard Ustatistics with product as kernel). The statistics play a special role since they belong to the narrower class of emerging functions, meaning that is an evariable whenever are evariables (not necessarily independent).
Multiple hypothesis testing using was explored in [5, 6], and in this note we will mainly concentrate on . It will be convenient to generalize (1) to the case ; namely, we set
(we are mostly interested in the case and ).
Let us fix the underlying sample space , which is simply a measurable space. Let be the set of all probability measure on the sample space. A simple hypothesis is and a (composite) hypothesis is . An evariable w.r. to a hypothesis is an extended random variable such that for all . It is clear that any iemerging function transforms independent evariables w.r. to (i.e., independent evariables w.r. to any ) to an evariable w.r. to .
An evalue is a value taken by an evariable. Let us fix , hypotheses , and independent evariables w.r. to , respectively. (The evariables are required to be independent under any .) An etest is a family , , of nonnegative extended random variables such that for all .
Let us say that a measurable function is a discovery matrix if there exists an etest , , such that, for all and all ,
(2) 
where and stand for “and” and “or”, respectively. To emphasize that we interpret as a matrix, we write its values as . The intuition behind (2) is that if is large and we reject hypotheses with largest , we can count on at least true discoveries.
Algorithm 1 is one way of constructing a discovery matrix based on a family of iemerging functions , . It uses the notation , where and is a symmetric function of arguments, to mean the value of on the sequence of , , arranged in any order. The algorithm is an obvious modification of Algorithm 2 in [6]; now we apply it to arbitrary iemerging functions (such as ) rather than just to arithmetic mean (i.e., ). As in [6], the evalues are assumed to be ordered, without loss of generality.
3 A toy simulation study
In this section we run Algorithm 1 applied to and . Slightly generalizing the explanation in [6, Appendix B in Working Paper 27], we can see that the discovery matrix can be computed in time . For , the time can be improved from to [6, Appendix B in Working Paper 27]. For , we can easily improve the time to by noticing that
This is sufficient to cope with the case that we usually use in our simulation studies.
We generate the base evalues as in Section 3 of [6]
: the null hypothesis is
, , the first observations are generated from , the last from , all independently, and the base evariables are the likelihood ratiosThe results are shown in Figure 1 (whose left panel is identical to the left panel of Figure 2 in [6]); they are much better for . Each panel shows the lower triangular matrix , the left for and the right for . The colour scheme used in this figure is inspired by Jeffreys’s [3, Appendix B] (as in [6]):

The entries with below 1 are shown in dark green; there is no evidence that there are at least true discoveries among hypotheses with the largest evalues.

The entries are shown in green. For them the evidence is poor.

The entries are shown in yellow. The evidence is substantial.

The entries are shown in red. The evidence is strong.

The entries are shown in dark red. The evidence is very strong.

Finally, the entries are shown in black, and for them the evidence is decisive.
It is interesting that after the crude etop calibration our method produces pvalues that look even better than the pvalues produced by the GWGS procedure (in the terminology of [6]) designed specifically for pvalues: see Figure 2.
In Figure 2 we use what we called Fisher’s scale in [6], but now we extend it by two further thresholds, one of which is , as advocated by [1]. Our colour scheme is:

Pvalues above are shown in green; they are not significant.

Pvalues between and are shown in yellow; they are significant but not highly significant.

Pvalues between and are shown in red; they are highly significant (but fail to attain the more stringent criterion of significance advocated in [1]).

Pvalues between and are shown in dark red.

Pvalues below are shown in black; they can be regarded as providing decisive evidence against the null hypothesis (to use Jeffreys’s expression).
4 An attempt of a theoretical explanation
We start from an alternative representation of , which will shed some light on the expected performance of our algorithm.
Lemma 4.1.
For any ,
(3) 
Proof.
By definition,
Corollary 4.2.
For any ,
For some the equality holds as equality.
Proof.
The first statement follows from , and an example for the second one is . ∎
According to Corollary 4.2,
which we will call the relative (sample) variance of , is a dimensionless quantity in the interval . When , we set . The relative variance is zero if and only if all coincide, and it is 1 if and only if all but one are zero.
Using the notion of relative variance, we can rewrite (3) as
We can see the method of this paper based on has a potential for improving on the method of [6], but the best it can achieve is squaring the entries of the discovery matrix. An entry is squared if the multiset of evalues on which the infimum in the algorithm of [6] is attained consists of a single value. Otherwise we suffer as the evalues become more diverse.
5 Conclusion
The most natural direction of further research is to find computationally efficient procedures for computing discovery matrices based on , .
Acknowledgments
We are grateful to Yuri Gurevich for useful discussions. In our simulation studies we used Python and R, including the package hommel [2].
V. Vovk’s research has been partially supported by Astra Zeneca and Stena Line. R. Wang is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN201803823, RGPAS2018522590).
References
 [1] Daniel J. Benjamin, James O. Berger, Magnus Johannesson, Brian A. Nosek, EricJan Wagenmakers, Richard Berk, Kenneth A. Bollen, Björn Brembs, Lawrence Brown, Colin Camerer, David Cesarini, Christopher D. Chambers, Merlise Clyde, Thomas D. Cook, Paul De Boeck, Zoltan Dienes, Anna Dreber, Kenny Easwaran, Charles Efferson, Ernst Fehr, Fiona Fidler, Andy P. Field, Malcolm Forster, Edward I. George, Richard Gonzalez, Steven Goodman, Edwin Green, Donald P. Green, Anthony Greenwald, Jarrod D. Hadfield, Larry V. Hedges, Leonhard Held, Teck Hua Ho, Herbert Hoijtink, Daniel J. Hruschka, Kosuke Imai, Guido Imbens, John P. A. Ioannidis, Minjeong Jeon, James Holland Jones, Michael Kirchler, David Laibson, John List, Roderick Little, Arthur Lupia, Edouard Machery, Scott E. Maxwell, Michael McCarthy, Don Moore, Stephen L. Morgan, Marcus Munafó, Shinichi Nakagawa, Brendan Nyhan, Timothy H. Parker, Luis Pericchi, Marco Perugini, Jeff Rouder, Judith Rousseau, Victoria Savalei, Felix D. Schönbrodt, Thomas Sellke, Betsy Sinclair, Dustin Tingley, Trisha Van Zandt, Simine Vazire, Duncan J. Watts, Christopher Winship, Robert L. Wolpert, Yu Xie, Cristobal Young, Jonathan Zinman, and Valen E. Johnson. Redefine statistical significance: We propose to change the default pvalue threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries (Comment). Nature Human Behaviour, 2:6–10, 2018.
 [2] Jelle J. Goeman, Rosa Meijer, and Thijmen Krebs. hommel: Methods for closed testing with Simes inequality, in particular Hommel’s method, 2019. R package version 1.5, available on CRAN.
 [3] Harold Jeffreys. Theory of Probability. Oxford University Press, Oxford, third edition, 1961.
 [4] Vladimir Vovk. Nonalgorithmic theory of randomness. Technical Report arXiv:1910.00585 [math.ST], arXiv.org ePrint archive, October 2019. The conference version is to appear in: Fields of Logic and Computation III: Essays Dedicated to Yuri Gurevich on the Occasion of His 80th Birthday, ed. by Andreas Blass, Patrick Cégilski, Nachum Dershowitz, Manfred Droste, and Bernd Finkbeiner. Springer, 2020.
 [5] Vladimir Vovk and Ruodu Wang. Combining evalues and pvalues. Technical Report arXiv:1912.06116 [math.ST], arXiv.org ePrint archive, December 2019.
 [6] Vladimir Vovk and Ruodu Wang. True and false discoveries with evalues. Technical Report arXiv:1912.13292 [math.ST], arXiv.org ePrint archive, December 2019. For the latest version, see alrw.net, Working Paper 27.
Comments
There are no comments yet.