1 Introduction
The joint distribution of order statistics of stochastically independent random variables plays a pivotal role in the theory of empirical processes and in nonparametric statistics; see, e. g., Shorack and Wellner (2009) and Dickhaus (2018). For instance, the exact finitesample null distributions of classical goodnessoffit tests like the KolmogorovSmirnov and the Cramérvon Mises test as well as those of modern ”higher criticism” goodnessoffit tests rely on such joint distributions; cf. Gontscharuk et al. (2015), Gontscharuk et al. (2016a), and Finner and Gontscharuk (2018) for recent developments and further references. In simultaneous statistical inference, the joint distribution of ordered
values is needed to analyze the type I and type II error behaviour of stepwise rejective multiple test procedures; cf. Chapter 5 of
Dickhaus (2014).In the case that are identically distributed (we refer to this case as a onegroup model), classical recursive methods like Bolshev’s recursion, Noe’s recursion, and Steck’s recursion allow for computing the joint cumulative distribution function (cdf) of exactly; cf. Section 9.3 of Shorack and Wellner (2009). A generalization of Steck’s recursion to twogroup models has been introduced by Blanchard et al. (2014). The other aforementioned recursions can be generalized in an analogous manner, as we will demonstrate in Section 3 of the present work.
While conceptually appealing, numerical properties of the aforementioned recursions are not well understood yet, and existing implementations into computer software often refer to ruleofthumbtype upper bounds on such that the respective implementation is trustworthy. For example, Art B. Owen reports in his implementation of the twosided version of Noe’s recursion in C (see https://www.stat.washington.edu/jaw/RESEARCH/SOFTWARE/BERKJONES/BJRBJCCode/noe.c) that the recursion works well for
but ”For larger n (eg 1800 or more) […] unexplained odd behavior.” Similarly, in the R Package mutoss (cf.
Blanchard et al. (2010)) the following comment is made on the implementation of Bolshev’s recursion: ”Because of numerical issues should not be greater than 100.” Recently, Moscovich and Nadler (2017) introduced a computational method for onegroup models. However, they do not consider the numerical accuracy of their approach rigorously.In this work, we contribute to the analysis of the numerical accuracy and the computational complexity of existing approaches for computing the joint distribution of in a mathematically rigorous manner. Furthermore, we provide novel computational techniques for one and twogroup models which are guaranteed to provide accurate results for arbitrary sample size . The rest of the material is structured as follows. In Section 2, we introduce the relevant quantities. The (generalized) recursions for one and twogroup models are provided in Section 3, together with a rigorous analysis of their computational complexities and their numerical properties. Our proposed exact computational methods rely on rational arithmetic (Section 4) and on faithful rounding (Section 5), respectively. Applications in multiple hypothesis testing are given in Section 6, and we conclude with a discussion in Section 7. Lengthy proofs as well as pseudo code for the considered algorithms are deferred to the Appendix.
2 Order Statistics
Throughout the following sections, we let for a natural number . Consider stochastically independent, realvalued random variables
, which are all driven by the same probability measure
. Let , and recursively definefor . Then we call the order statistics of , which we will denote by in the remainder. The random variable will be called the
th order statistic of the random vector
. Let denote the marginal cdf of for . This paper will present methods for the quick and numerically stable calculation ofassuming that where are two continuous distribution functions on and with denoting the number of ’s distributed according to , . Since it holds that
it follows that , where . Therefore, it is sufficient to consider the calculation of for an arbitrary continuous distribution function and argument . In the sequel, we suppress the dependence on and notationally, and write for notational convenience.
As outlined in the Introduction, for there exist many well known recursions (see e. g. Section 9.3 of Shorack and Wellner (2009)) for computing . There are also newer approaches based on numerical integration(see Moscovich et al. (2016)) or based on the Poisson process (see Moscovich and Nadler (2017)). Unfortunately, the former cannot be easily generalized to the case
, since the Lebesgue density of an order statistic is in general not piecewise constant. The latter is very fast due to usage of the Fourier transform, but numerically unstable for small values of the
’s. This can for instance be demonstrated using the thresholds of the wellknown linear stepup test (cf. Benjamini and Hochberg (1995)) for control of the false discovery rate (FDR); see Figure 1. Glueck et al. (2008a) proposed an algorithm with exponential complexity (cf. (Glueck et al., 2008a, Theorem 4.2)), resulting in a very high computational effort for moderate or large values of . However, the method of Glueck et al. (2008a) can be used to compute variate marginal distributions for , because in such cases the complexity of their approach reduces to .Since we are mostly concerned with the full joint distribution, we extend the approach suggested by Blanchard et al. (2014) and provide generalizations of Bolshev’s and Noe’s recursions. We compare them to the generalization of Steck’s recursion proposed by Blanchard et al. (2014)
and demonstrate that the Bolshev recursion is suitable for exact computations in rational arithmetic, whereas Noe’s recursion is numerically stable when computed in fixedprecision floating point arithmetic.
All our numerical calculations were performed on a Windows 7 machine with an Intel(R) Core(TM) i74790 CPU with 32 gigabytes of RAM.
3 The Generalized Recursions
Let , . Furthermore, let and be jointly stochastically independent. Let for and
(1) 
where , , denotes the th order statistic of , and is an increasing sequence with values in . The following subsections provide formulas for efficiently calculating , and we discuss their computational and numerical properties.
3.1 Generalization of Bolshev’s Recursion
Lemma 1 (Generalization of Bolshev’s Recursion).
The function from (1) satisfies the recursion
where
(2) 
Moreover, we have the following recursive relationships for .
For or this is simply the wellknown Bolshev recursion.
3.2 Generalization of Steck’s Recursion
Lemma 2 (Generalization of Steck’s Recursion).
Let . Then from (1) satisfies the recursion
where
(3) 
Letting
we can write
(4)  
(5) 
Furthermore, we have the following recursion for .
for and .
Proof.
See (Blanchard et al., 2014, Proposition 1) ∎
3.3 Generalization of Noe’s Recursion
Lemma 3 (Generalization of Noe’s Recursion).
Let , , and for
for .
Letting
we can write
(6) 
3.4 Computational Complexity and Numerical Properties
The computational complexity (defined to be the number of elementary arithmetic operations on floating point numbers) of each of the aforementioned recursions is given by the following lemma.
Lemma 4.
The proposed recursions can be implemented using

[leftmargin=!,labelwidth=Bolshev]
 Bolshev

 Steck

 Noe

elementary arithmetic operations (addition, subtraction, multiplication, division) and memory (assuming fixedprecision storage of all results).
The results of Lemma 4 suggest that Noe’s recursion might not be the best choice. However, for small values of the ’s, Bolshev’s recursion and Steck’s recursion are inherently numerically unstable. Consider for example and
Then both recursions, when implemented in double precision floating point arithmetic, result in negative values and huge relative errors (cf. Table 1) which can be explained by inaccurate or catastrophic cancellation, respectively.
Exact Probability  Steck  Rel. Err. (Steck)  Bolshev  Rel. Err. (Bolshev)  

2  9.76562E04  9.76562E04  0.00000E+00  9.76562E04  0.00000E+00 
3  9.53674E07  9.53674E07  0.00000E+00  9.53674E07  0.00000E+00 
4  9.31323E10  9.31323E10  0.00000E+00  9.31323E10  0.00000E+00 
5  9.09495E13  9.09495E13  0.00000E+00  9.09495E13  0.00000E+00 
6  8.88178E16  8.88178E16  0.00000E+00  8.88178E16  0.00000E+00 
7  8.67362E19  8.67362E19  0.00000E+00  1.73472E18  1.00000E+00 
8  8.47033E22  8.47033E22  0.00000E+00  1.10114E20  1.20000E+01 
9  8.27181E25  8.27181E25  0.00000E+00  6.85071E21  8.28100E+03 
10  8.07794E28  8.07794E28  0.00000E+00  2.70517E20  3.34884E+07 
11  7.88861E31  7.88861E31  0.00000E+00  1.12683E16  1.42842E+14 
12  4.33103E30  1.75898E20  4.06134E+09  2.83880E16  6.55456E+13 
Noe’s recursion, if implemented in a reasonable manner, never results in negative values. Furthermore by (Jeannerod and Rump, 2018, Equation (3)) the relative error is bounded (if the coefficients are computed with a bounded relative error) since all summands are nonnegative.
Remark 1.
Noe’s recursion can be easily parallelized since the can be, for any fixed , computed in parallel.
4 Exact Evaluation of Bolshev’s Recursion
If only elementary arithmetic operations are utilized and the number of such operations is not too large it is feasible to exactly evaluate expressions using rational arithmetic.^{1}^{1}1Our C++ implementation is based on The GNU Multiple Precision Arithmetic Library. We show that this is indeed the case for the Bolshev recursion as well its generalization for the twogroup case presented in Section 3.1.
First we consider the case where , hence : Even though Bolshev’s recursion involves binomial coefficients our proposed Algorithm 1 (cf. the Appendix) for the onegroup case evaluates it using only
elementary arithmetic operations (addition, subtraction, multiplication, division). For an illustration, considering the sequence we observed the execution times depicted in Figure 2.
Remark 2.
Our proposed Algorithm 2 (cf. the Appendix) implements the twogroup case in elementary arithmetic operations. Consequently, for equal sample sizes the number of operations is of . Notice that this is a marked improvement over the exponential complexity reported by (Glueck et al., 2008a, Theorem 4.2). Figure 3 illustrates the observed execution time for calculating and , where .
For not necessarily equal sample sizes and , our implementation of Algorithm 2 needs arithmetic operations.
Since the cdf of many interesting distributions is not available in a closed form the thresholds might either not be exactly calculable or simply not exactly representable as rational numbers. Lemma 5 analyzes the error propagation when and / or are inexact.
Lemma 5.
Let
(7) 
and denote by approximations thereof, which are obtained by replacing and by approximations and . If for it holds that for the it follows that for all
where denotes the approximation of obtained by using and instead of and .
5 Faithfully Rounded Evaluation of Noe’s Recursion
We implemented faithfully rounded^{2}^{2}2That is, the result is either exact (if the exact value is a floating point number) or it is one of the two closest floating point numbers. floatingpoint computations as described by Rump and Lange (2017) as a portable singleheader C++11 library.^{3}^{3}3Available at https://github.com/jvschroeder/PairArithmetic/. Utilizing this library we implemented the generalization of Noe’s recursion presented in Section 3.3 obtaining faithfully rounded results if no underflow occurs.^{4}^{4}4In our experience this is usually the case if the values of are not too close to the smallest (in absolute value) normal double, which equals on most computer architectures. In case of an underflow the results are smaller than the true values of , but never less than zero. Parallelization was implemented using Intel®Threading Building Blocks (TBB).
Notice that Noe’s recursion (and our generalization thereof) satisfies the NIC principle (No Inaccurate Cancellation, cf. (Rump and Lange, 2017, Definition 2.2)), that is there are no sums where at least one summand is not an input to the algorithm and the summands have opposite signs. Thus, by examining the evaluation tree (cf. (Rump and Lange, 2017, Definition 2.2)) of a concrete implementation, it is possible to calculate a number according to Equation (11) of Rump and Lange (2017). The result will be faithfully rounded if no under or overflow occurs, and (when utilising a double precision floating point numbers). For our concrete implementation we obtain , provided that . Thus (assuming that no over or underflow occurs) the result is guaranteed to be faithfully rounded if . Our implementation could be, in terms of , significantly improved by using binary summation. For example, for we obtain , while the corresponding number of in the case of binary summation would equal . The latter improvement however comes at an additional computational cost, and may be considered mostly of theoretical interest since the calculation for already takes approximately minutes on a core Intel CPU. Figure 3 compares the runtime of our implementation of our generalization of Noe’s recursion to that of the algorithm from the previous section. It becomes apparent that Noe’s recursion with faithful rounding is much faster then Bolshev’s recursion implemented in rational arithmetic. For practical applications, we therefore recommend Noe’s recursion with faithful rounding, at least if a fixed numerical precision is sufficient.
6 Applications in Multiple Hypothesis Testing
As discussed by Roquain and Villers (2011), the values of for all and are important building blocks for calculating the joint distribution of the number of rejections and the number of false rejections for stepup multiple tests. The random variables and play an important role when analyzing the type I and type II error behavior of such multiple tests. One important observation is, that the previously discussed recursions for calculating also calculate all such ’s as intermediate results.
Following Blanchard et al. (2014) we consider null hypotheses which are simultaneously under consideration under one and the same statistical model. We assume that associated values are available on which the multiple test operates. Furthermore, we assume that (regarded as random variables) are jointly distributed according to one of the following models

The ’s are stochastically independent with marginal distributions
where denotes the number of true null hypotheses among and is a given continuous cdf on .

Let
denote a binomially distributed random variable,
. Conditionally on , the ’s are jointly distributed according to .
A multiple test operating on is a measurable mapping , where hypothesis is rejected iff . Under denote by a constant random variable. Then the (random) number of rejections of the multiple test is given by , and
is the (random) number of false rejections (type I errors).
In the following we will consider stepup procedures with critical values such that . The corresponding decision rule can be written as
Summarizing results of Roquain and Villers (2011), the joint distribution of and for any stepup procedure has the following properties.
Lemma 6.
Let .

Under the unconditional model it holds that
where and .

Under the conditional model it holds that
where and .
Combining Lemma 6 with the previously discussed efficient evaluation of it is possible to calculate various summary statistics pertaining to the joint distribution of under the above models.
Definition 1.

The FDR of is given by the expectation of the false discovery proportion (FDP) of , which is given by

Considering the number of correct rejections the average power of is given by
(8) where the convention is utilized and where (under ) or (under ), respectively.

The power is the probability of rejecting at least of the false hypotheses:
(9) where, again, the convention is utilised and where (under ) or (under ), respectively.
In order to provide some numerical illustrations, we first consider the average power (cf. (8)) under where
(10) 
and . This is the setting considered in (Glueck et al., 2008b, Table 2) where the average power for was calculated for the BenjaminiHochberg procedure (controlling the FDR at ) for independent twosided one sample ztests. In our notation, the BenjaminiHochberg (linear stepup test) procedure equals with for . Table 2 illustrates the results obtained for . Due to space constraints only the first six columns (corresponding to ) are presented. The calculation of the full table (not presented here) took less than a second for an one magnitude larger (50 instead of 5) than the one considered by Glueck et al. (2008b). Figure 4 illustrates the time needed to calculate one row of such a table corresponding to some when utilizing our proposed algorithms.
As a second example, we consider the computation of from (9). Again, we choose as in the BenjaminiHochberg case. An asymptotic approximation of this quantity for
where denotes the distribution function of a noncentral chisquared random variable with degrees of freedom and noncentrality parameter , is given in (Izmirlian, 2018, Table 3). Our results can be used to calculate . Table 3 gives the faithfully rounded values for the power for the parameters considered in (Izmirlian, 2018, Table 3).
We conclude by giving an example for the exact distribution of the FDP which shows why the FDR is not always an appropriate summary statistic. Consider again the multiple twosided ztest described in Glueck et al. (2008b), that is given by (10), for and . It is clear that in Figure 5 the distribution of the FDP is neither symmetric about its mean (the FDR, which is depicted as dotted vertical line) nor concentrated around the FDR. A similar argumentation has been used by, among others, Blanchard et al. (2014) and Delattre and Roquain (2015)
in order to motivate the computation of the full distribution of the FDP and to control its quantiles. The latter task is inherently computationally demanding.
2  0.56539  0.50342  

3  0.54576  0.49842  0.44439  
4  0.53446  0.49583  0.45256  0.40451  
5  0.52712  0.49440  0.45819  0.41837  0.37494  
6  0.52201  0.49357  0.46241  0.42840  0.39148  0.35175 
7  0.51827  0.49310  0.46574  0.43606  0.40399  0.36955 
8  0.51543  0.49285  0.46846  0.44214  0.41383  0.38349 
9  0.51320  0.49273  0.47073  0.44710  0.42178  0.39473 
10  0.51142  0.49270  0.47266  0.45124  0.42836  0.40398 
11  0.50997  0.49272  0.47434  0.45475  0.43390  0.41174 
12  0.50877  0.49278  0.47580  0.45777  0.43863  0.41833 
13  0.50776  0.49286  0.47709  0.46039  0.44271  0.42401 
14  0.50690  0.49295  0.47823  0.46268  0.44627  0.42895 
15  0.50616  0.49306  0.47925  0.46472  0.44941  0.43329 
16  0.50552  0.49316  0.48018  0.46653  0.45219  0.43712 
17  0.50497  0.49327  0.48101  0.46815  0.45467  0.44053 
18  0.50447  0.49338  0.48177  0.46962  0.45690  0.44358 
19  0.50404  0.49348  0.48246  0.47094  0.45891  0.44634 
20  0.50365  0.49358  0.48309  0.47215  0.46074  0.44883 
21  0.50330  0.49368  0.48367  0.47325  0.46240  0.45109 
22  0.50298  0.49378  0.48421  0.47427  0.46392  0.45316 
23  0.50269  0.49387  0.48471  0.47520  0.46532  0.45505 
24  0.50243  0.49395  0.48517  0.47605  0.46660  0.45679 
25  0.50219  0.49404  0.48559  0.47685  0.46779  0.45840 
26  0.50198  0.49412  0.48599  0.47759  0.46889  0.45988 
27  0.50178  0.49420  0.48637  0.47828  0.46991  0.46126 
28  0.50159  0.49427  0.48671  0.47892  0.47086  0.46254 
29  0.50142  0.49434  0.48704  0.47952  0.47175  0.46373 
30  0.50126  0.49441  0.48735  0.48008  0.47258  0.46485 
31  0.50111  0.49447  0.48764  0.48060  0.47336  0.46589 
32  0.50097  0.49453  0.48791  0.48110  0.47409  0.46687 
33  0.50084  0.49459  0.48817  0.48157  0.47478  0.46779 
34  0.50072  0.49465  0.48841  0.48201  0.47542  0.46866 
35  0.50060  0.49470  0.48864  0.48242  0.47604  0.46948 
36  0.50050  0.49475  0.48886  0.48282  0.47661  0.47025 
37  0.50040  0.49480  0.48907  0.48319  0.47716  0.47098 
38  0.50030  0.49485  0.48926  0.48354  0.47768  0.47167 
39  0.50021  0.49489  0.48945  0.48388  0.47817  0.47233 
40  0.50012  0.49494  0.48963  0.48420  0.47864  0.47295 
41  0.50004  0.49498  0.48980  0.48451  0.47909  0.47354 
42  0.49996  0.49502  0.48996  0.48480  0.47951  0.47411 
43  0.49989  0.49506  0.49012  0.48508  0.47992  0.47465 
44  0.49982  0.49509  0.49027  0.48534  0.48031  0.47516 
45  0.49975  0.49513  0.49041  0.48559  0.48068  0.47565 
46  0.49969  0.49516  0.49055  0.48584  0.48103  0.47612 
47  0.49963  0.49520  0.49068  0.48607  0.48137  0.47657 
48  0.49957  0.49523  0.49081  0.48629  0.48169  0.47700 
49  0.49951  0.49526  0.49093  0.48651  0.48201  0.47741 
50  0.49946  0.49529  0.49104  0.48672  0.48230  0.47781 
, common variance
, , cf. (10)) when hypotheses are true. The bold values are exactly those in (Glueck et al., 2008b, Table 2).Eff Sz.  n  est. pwr  pwr  Diff in std  

1  0.60000  5  70  0.24900  0.26691  1.23987 
2  0.60000  5  80  0.39600  0.39977  0.24081 
3  0.60000  5  90  0.53800  0.53479  0.18527 
4  0.60000  5  100  0.65700  0.65626  0.05057 
5  0.60000  20  50  0.02800  0.02538  0.48379 
6  0.60000  20  60  0.15700  0.13890  1.69096 
7  0.60000  20  70  0.37800  0.36864  0.61103 
8  0.60000  20  80  0.59900  0.62231  1.50514 
9  0.60000  60  40  0.00200  0.00143  0.43439 
10  0.60000  60  50  0.09900  0.08584  1.49443 
11  0.60000  60  60  0.49200  0.49307  0.06522 
12  0.60000  100  30  0.00000  0.00000  Inf 
13  0.60000  100  40  0.00600  0.00658  0.22041 
14  0.60000  100  50  0.27000  0.30726  2.80406 
15  0.80000  5  40  0.25200  0.26037  0.59552 
16  0.80000  5  50  0.50200  0.49870  0.19588 
17  0.80000  5  60  0.70400  0.70951  0.39928 
18  0.80000  20  30  0.03600  0.03969  0.60858 
19  0.80000  20  40  0.35700  0.36732  0.64187 
20  0.80000  60  20  0.00000  0.00004  0.18826 
21  0.80000  60  30  0.14700  0.15775  0.88238 
22  0.80000  100  20  0.00300  0.00013  8.59402 
23  0.80000  100  30  0.50600  0.48563  1.42549 
24  1.00000  5  30  0.39200  0.39421  0.14658 
25  1.00000  20  20  0.04500  0.04534  0.05420 
26  1.00000  20  30  0.61400  0.63660  1.40588 
27  1.00000  60  20  0.22500  0.19941  2.00156 
28  1.00000  100  20  0.58600  0.57569  0.64138 
under the null hypothesis and
under the alternative where , . The results are under the RMmodel where. For comparison the fourth column contains the Monte Carlo estimates (sample size
) given in (Izmirlian, 2018, Table 3). The following column gives the faithfully rounded power calculated using our method and the last column states the absolute difference between the previous two columns dived by the standard deviation of the Monte Carlo estimation (which was estimated using 300 replicates). As expected most Monte Carlo approximations are within one or two standard deviations of the faithfully rounded result. For the twelfth row no value is given since the estimated standard deviation was zero. This is not unexpected since the faithfully rounded result is
which is two orders magnitude smaller than .