# Efficient Calculation of the Joint Distribution of Order Statistics

We consider the problem of computing the joint distribution of order statistics of stochastically independent random variables in one- and two-group models. While recursive formulas for evaluating the joint cumulative distribution function of such order statistics exist in the literature for a longer time, their numerical implementation remains a challenging task. We tackle this task by presenting novel generalizations of known recursions which we utilize to obtain exact results (calculated in rational arithmetic) as well as faithfully rounded results. Finally, some applications in stepwise multiple hypothesis testing are discussed.

• 2 publications
• 13 publications
11/22/2021

### A Dynamic Programming Algorithm to Compute Joint Distribution of Order Statistics on Graphs

Order statistics play a fundamental role in statistical procedures such ...
07/29/2022

### Risk aggregation with FGM copulas

We offer a new perspective on risk aggregation with FGM copulas. Along t...
09/03/2020

### Inaccuracy measures for concomitants of GOS in Morgenstern family

In this paper, we obtain a measure of inaccuracy between rth concomitant...
06/17/2019

### Efficient computation of the cumulative distribution function of a linear mixture of independent random variables

For a variant of the algorithm in [Pit19] (arXiv:1903.10816) to compute ...
12/31/2020

### Perfect Gibbs Sampling of Order Constrained Non-IID Ordered Random Variates with Application to Bayesian Principal Components Analysis

Order statistics arising from m independent but not identically distribu...
12/05/2020

### Deep Archimedean Copulas

A central problem in machine learning and statistics is to model joint d...
07/15/2021

### Computing Permanents on a Trellis

The problem of computing the permanent of a matrix has attracted interes...

## 1 Introduction

The joint distribution of order statistics of stochastically independent random variables plays a pivotal role in the theory of empirical processes and in nonparametric statistics; see, e. g., Shorack and Wellner (2009) and Dickhaus (2018). For instance, the exact finite-sample null distributions of classical goodness-of-fit tests like the Kolmogorov-Smirnov and the Cramér-von Mises test as well as those of modern ”higher criticism” goodness-of-fit tests rely on such joint distributions; cf. Gontscharuk et al. (2015), Gontscharuk et al. (2016a), and Finner and Gontscharuk (2018) for recent developments and further references. In simultaneous statistical inference, the joint distribution of ordered

-values is needed to analyze the type I and type II error behaviour of stepwise rejective multiple test procedures; cf. Chapter 5 of

Dickhaus (2014).

In the case that are identically distributed (we refer to this case as a one-group model), classical recursive methods like Bolshev’s recursion, Noe’s recursion, and Steck’s recursion allow for computing the joint cumulative distribution function (cdf) of exactly; cf. Section 9.3 of Shorack and Wellner (2009). A generalization of Steck’s recursion to two-group models has been introduced by Blanchard et al. (2014). The other aforementioned recursions can be generalized in an analogous manner, as we will demonstrate in Section 3 of the present work.

While conceptually appealing, numerical properties of the aforementioned recursions are not well understood yet, and existing implementations into computer software often refer to rule-of-thumb-type upper bounds on such that the respective implementation is trustworthy. For example, Art B. Owen reports in his implementation of the two-sided version of Noe’s recursion in C (see https://www.stat.washington.edu/jaw/RESEARCH/SOFTWARE/BERKJONES/BJ-RBJ-C-Code/noe.c) that the recursion works well for

but ”For larger n (eg 1800 or more) […] unexplained odd behavior.” Similarly, in the R Package mutoss (cf.

Blanchard et al. (2010)) the following comment is made on the implementation of Bolshev’s recursion: ”Because of numerical issues should not be greater than 100.” Recently, Moscovich and Nadler (2017) introduced a computational method for one-group models. However, they do not consider the numerical accuracy of their approach rigorously.

In this work, we contribute to the analysis of the numerical accuracy and the computational complexity of existing approaches for computing the joint distribution of in a mathematically rigorous manner. Furthermore, we provide novel computational techniques for one- and two-group models which are guaranteed to provide accurate results for arbitrary sample size . The rest of the material is structured as follows. In Section 2, we introduce the relevant quantities. The (generalized) recursions for one- and two-group models are provided in Section 3, together with a rigorous analysis of their computational complexities and their numerical properties. Our proposed exact computational methods rely on rational arithmetic (Section 4) and on faithful rounding (Section 5), respectively. Applications in multiple hypothesis testing are given in Section 6, and we conclude with a discussion in Section 7. Lengthy proofs as well as pseudo code for the considered algorithms are deferred to the Appendix.

## 2 Order Statistics

Throughout the following sections, we let for a natural number . Consider stochastically independent, real-valued random variables

, which are all driven by the same probability measure

. Let , and recursively define

 ij :=argmini∈IjXi, Ij :=Ij−1∖{ij−1}

for . Then we call the order statistics of , which we will denote by in the remainder. The random variable will be called the

-th order statistic of the random vector

. Let denote the marginal cdf of for . This paper will present methods for the quick and numerically stable calculation of

 ΨG1,G2n1,n2(b):=P(X1:n≤b1,…,Xn:n≤bn),b=(b1,…,bn)⊤∈Rn,

assuming that where are two continuous distribution functions on and with denoting the number of ’s distributed according to , . Since it holds that

 G1(Xi)∼{Uni[0,1],Fi=G1,G2∘G−11,Fi=G2,

it follows that , where . Therefore, it is sufficient to consider the calculation of for an arbitrary continuous distribution function and argument . In the sequel, we suppress the dependence on and notationally, and write for notational convenience.

As outlined in the Introduction, for there exist many well known recursions (see e. g. Section 9.3 of Shorack and Wellner (2009)) for computing . There are also newer approaches based on numerical integration(see Moscovich et al. (2016)) or based on the Poisson process (see Moscovich and Nadler (2017)). Unfortunately, the former cannot be easily generalized to the case

, since the Lebesgue density of an order statistic is in general not piece-wise constant. The latter is very fast due to usage of the Fourier transform, but numerically unstable for small values of the

’s. This can for instance be demonstrated using the thresholds of the well-known linear step-up test (cf. Benjamini and Hochberg (1995)) for control of the false discovery rate (FDR); see Figure 1. Glueck et al. (2008a) proposed an algorithm with exponential complexity (cf. (Glueck et al., 2008a, Theorem 4.2)), resulting in a very high computational effort for moderate or large values of . However, the method of Glueck et al. (2008a) can be used to compute -variate marginal distributions for , because in such cases the complexity of their approach reduces to .

Since we are mostly concerned with the full joint distribution, we extend the approach suggested by Blanchard et al. (2014) and provide generalizations of Bolshev’s and Noe’s recursions. We compare them to the generalization of Steck’s recursion proposed by Blanchard et al. (2014)

and demonstrate that the Bolshev recursion is suitable for exact computations in rational arithmetic, whereas Noe’s recursion is numerically stable when computed in fixed-precision floating point arithmetic.

All our numerical calculations were performed on a Windows 7 machine with an Intel(R) Core(TM) i7-4790 CPU with 32 gigabytes of RAM.

## 3 The Generalized Recursions

Let , . Furthermore, let and be jointly stochastically independent. Let for and

 Ψ(i1,i2) :=P(X1:M≤b1,…,Xi:M≤bi1+i2), (1)

where , , denotes the -th order statistic of , and is an increasing sequence with values in . The following subsections provide formulas for efficiently calculating , and we discuss their computational and numerical properties.

### 3.1 Generalization of Bolshev’s Recursion

###### Lemma 1 (Generalization of Bolshev’s Recursion).

The function from (1) satisfies the recursion

 Ψ(m1,m2) =1−∑0≤k1≤m10≤k2≤m2k1+k2

where

 M(m1,m2)k1,k2:=(m1k1)(m2k2)(1−bk1+k2+1)m1−k1⋅(1−F(bk1+k2+1))m2−k2. (2)

Moreover, we have the following recursive relationships for .

 M(m1+1,m2)k1,k2 =⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩1k2=m2∧k1=m1+1M(m1,m2)m1,k2+1⋅k2+1m2−k2⋅(1−F(bm1+(k2+1)+1))k2

For or this is simply the well-known Bolshev recursion.

### 3.2 Generalization of Steck’s Recursion

###### Lemma 2 (Generalization of Steck’s Recursion).

Let . Then from (1) satisfies the recursion

 Ψ(m1,m2) =(bm1+m2)m1F(bm1+m2)m2−∑0≤k1≤m10≤k2≤m2k1+k2≤m1+m2−2M(m1,m2)k1,k2⋅Ψ(k1,k2),

where

 (3)

Letting

 a(k,j):=(kj)~{}~{}and~{}~{}a(k,j)={1k=jj+1k−j×a(k,j+1)j

we can write

 M(m1,m2)0,j (4) M(m1,m2)j,m2 =a(m1,j)⋅(bm1+m2−bj+m2+1)m1−j. (5)

Furthermore, we have the following recursion for .

 M(m1,m2)k1+1,k2−1 =M(m1,m2)k1,k2×F(bm1+m2)−F(bk1+k2+1)bm1+m2−bk1+k2+1×m1−k1k1+1×m2−k2+1k2

for and .

###### Proof.

See (Blanchard et al., 2014, Proposition 1)

### 3.3 Generalization of Noe’s Recursion

###### Lemma 3 (Generalization of Noe’s Recursion).

Let , , and for

 Qi1,i2(m) :=∑0≤k1≤i10≤k2≤i2m−1≤k1+k2Mi1,i2k1,k2(m)⋅Qk1,k2(m−1), Mi1,i2k1,k2(m) :=(i1k1)(i2k2)×(bm−bm−1)i1−k1×(F(bm)−F(bm−1))i2−k2

for .

Then the function from (1) satisfies

 Ψ(i1,i2) =Qi1,i2(i1+i2)

for and .

Letting

 a(m),1(j):=(bm−bm−1)j~{}~{}and~{}~{}a(m),2(j):=(F(bm)−F(bm−1))j,

we can write

 Mi1,i2k1,k2(m)=(i1k1)(i2k2)×a(m),1(i1−k1)×a(m),2(i2−k2). (6)

### 3.4 Computational Complexity and Numerical Properties

The computational complexity (defined to be the number of elementary arithmetic operations on floating point numbers) of each of the aforementioned recursions is given by the following lemma.

###### Lemma 4.

The proposed recursions can be implemented using

[leftmargin=!,labelwidth=Bolshev]

Bolshev

Steck

Noe

elementary arithmetic operations (addition, subtraction, multiplication, division) and memory (assuming fixed-precision storage of all results).

The results of Lemma 4 suggest that Noe’s recursion might not be the best choice. However, for small values of the ’s, Bolshev’s recursion and Steck’s recursion are inherently numerically unstable. Consider for example and

 bi :={2−10if i≤10,2−1if i=11.

Then both recursions, when implemented in double precision floating point arithmetic, result in negative values and huge relative errors (cf. Table 1) which can be explained by inaccurate or catastrophic cancellation, respectively.

Noe’s recursion, if implemented in a reasonable manner, never results in negative values. Furthermore by (Jeannerod and Rump, 2018, Equation (3)) the relative error is bounded (if the coefficients are computed with a bounded relative error) since all summands are non-negative.

###### Remark 1.

Noe’s recursion can be easily parallelized since the can be, for any fixed , computed in parallel.

## 4 Exact Evaluation of Bolshev’s Recursion

If only elementary arithmetic operations are utilized and the number of such operations is not too large it is feasible to exactly evaluate expressions using rational arithmetic.111Our C++ implementation is based on The GNU Multiple Precision Arithmetic Library. We show that this is indeed the case for the Bolshev recursion as well its generalization for the two-group case presented in Section 3.1.

First we consider the case where , hence : Even though Bolshev’s recursion involves binomial coefficients our proposed Algorithm 1 (cf. the Appendix) for the one-group case evaluates it using only

 \#Operations=n+(n+1)+n∑k=2[2+k−1∑j=16]=3n2+n−1

elementary arithmetic operations (addition, subtraction, multiplication, division). For an illustration, considering the sequence we observed the execution times depicted in Figure 2.

###### Remark 2.

Our proposed Algorithm 2 (cf. the Appendix) implements the two-group case in elementary arithmetic operations. Consequently, for equal sample sizes the number of operations is of . Notice that this is a marked improvement over the exponential complexity reported by (Glueck et al., 2008a, Theorem 4.2). Figure 3 illustrates the observed execution time for calculating and , where .

For not necessarily equal sample sizes and , our implementation of Algorithm 2 needs arithmetic operations.

Since the cdf of many interesting distributions is not available in a closed form the thresholds might either not be exactly calculable or simply not exactly representable as rational numbers. Lemma 5 analyzes the error propagation when and / or are inexact.

###### Lemma 5.

Let

 xi:=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩b1if i=1bi−bi−1if 1

and denote by approximations thereof, which are obtained by replacing and by approximations and . If for it holds that for the it follows that for all

 ~Ψ(i1,i2)∈Ψ(i1,i2)⋅((1−ε)i1+i2,(1+ε)i1+i2),

where denotes the approximation of obtained by using and instead of and .

## 5 Faithfully Rounded Evaluation of Noe’s Recursion

We implemented faithfully rounded222That is, the result is either exact (if the exact value is a floating point number) or it is one of the two closest floating point numbers. floating-point computations as described by Rump and Lange (2017) as a portable single-header C++11 library.333Available at https://github.com/jvschroeder/PairArithmetic/. Utilizing this library we implemented the generalization of Noe’s recursion presented in Section 3.3 obtaining faithfully rounded results if no underflow occurs.444In our experience this is usually the case if the values of are not too close to the smallest (in absolute value) normal double, which equals on most computer architectures. In case of an underflow the results are smaller than the true values of , but never less than zero. Parallelization was implemented using Intel®Threading Building Blocks (TBB).

Notice that Noe’s recursion (and our generalization thereof) satisfies the NIC principle (No Inaccurate Cancellation, cf. (Rump and Lange, 2017, Definition 2.2)), that is there are no sums where at least one summand is not an input to the algorithm and the summands have opposite signs. Thus, by examining the evaluation tree (cf. (Rump and Lange, 2017, Definition 2.2)) of a concrete implementation, it is possible to calculate a number according to Equation (11) of Rump and Lange (2017). The result will be faithfully rounded if no under- or overflow occurs, and (when utilising a double precision floating point numbers). For our concrete implementation we obtain , provided that . Thus (assuming that no over- or underflow occurs) the result is guaranteed to be faithfully rounded if . Our implementation could be, in terms of , significantly improved by using binary summation. For example, for we obtain , while the corresponding number of in the case of binary summation would equal . The latter improvement however comes at an additional computational cost, and may be considered mostly of theoretical interest since the calculation for already takes approximately minutes on a core Intel CPU. Figure 3 compares the runtime of our implementation of our generalization of Noe’s recursion to that of the algorithm from the previous section. It becomes apparent that Noe’s recursion with faithful rounding is much faster then Bolshev’s recursion implemented in rational arithmetic. For practical applications, we therefore recommend Noe’s recursion with faithful rounding, at least if a fixed numerical precision is sufficient.

## 6 Applications in Multiple Hypothesis Testing

As discussed by Roquain and Villers (2011), the values of for all and are important building blocks for calculating the joint distribution of the number of rejections and the number of false rejections for step-up multiple tests. The random variables and play an important role when analyzing the type I and type II error behavior of such multiple tests. One important observation is, that the previously discussed recursions for calculating also calculate all such ’s as intermediate results.

Following Blanchard et al. (2014) we consider null hypotheses which are simultaneously under consideration under one and the same statistical model. We assume that associated -values are available on which the multiple test operates. Furthermore, we assume that (regarded as random variables) are jointly distributed according to one of the following models

The ’s are stochastically independent with marginal distributions

 pi∼{Uni[0,1]if 1≤i≤m0,Fif m0+1≤i≤m,

where denotes the number of true null hypotheses among and is a given continuous cdf on .

Let

denote a binomially distributed random variable,

. Conditionally on , the ’s are jointly distributed according to .

A multiple test operating on is a measurable mapping , where hypothesis is rejected iff . Under denote by a constant random variable. Then the (random) number of rejections of the multiple test is given by , and

is the (random) number of false rejections (type I errors).

In the following we will consider step-up procedures with critical values such that . The corresponding decision rule can be written as

 SUt(p) :=[max({0}∪{i∈[m]:pi:m≤ti})],~{}~{}where~{}~{}[0]:=∅.

Summarizing results of Roquain and Villers (2011), the joint distribution of and for any step-up procedure has the following properties.

###### Lemma 6.

Let .

1. Under the unconditional model it holds that

 Pm,π0,F(V(SUt,p)=j,R(SUt,p)=k) =(mk)(kj)~πj0(1−~π0)k−jG(tk)kΨUni[0,1],Fm−k,0(1−G(tm),…,1−G(tk+1))

where and .

2. Under the conditional model it holds that

 Pm,m0,F(V(SUt,p)=j,R(SUt,p)=k) =(m0j)(m−m0k−j)tjk(F(tk))k−jΨUni[0,1],¯Fm−k−(m0−j),m0−j(1−tm,…,1−tk+1),

where and .

Combining Lemma 6 with the previously discussed efficient evaluation of it is possible to calculate various summary statistics pertaining to the joint distribution of under the above models.

###### Definition 1.

• The FDR of is given by the expectation of the false discovery proportion (FDP) of , which is given by

 FDP(SUt,p):=V(SUt,p)R(SUt,p)∨1.
• Considering the number of correct rejections the average power of is given by

 Powavg(SUt):=E[R(SUt,p)−V(SUt,p)m−M0], (8)

where the convention is utilized and where (under ) or (under ), respectively.

• The power is the probability of rejecting at least of the false hypotheses:

 Powλ(SUt):=P(R(SUt,p)−V(SUt,p)m−M0≥λ) (9)

where, again, the convention is utilised and where (under ) or (under ), respectively.

In order to provide some numerical illustrations, we first consider the average power (cf. (8)) under where

 F(t):=1+Φ(Φ−1(t2)−√N)−Φ(Φ−1(1−t2)−√N) (10)

and . This is the setting considered in (Glueck et al., 2008b, Table 2) where the average power for was calculated for the Benjamini-Hochberg procedure (controlling the FDR at ) for independent two-sided one sample z-tests. In our notation, the Benjamini-Hochberg (linear step-up test) procedure equals with for . Table 2 illustrates the results obtained for . Due to space constraints only the first six columns (corresponding to ) are presented. The calculation of the full table (not presented here) took less than a second for an one magnitude larger (50 instead of 5) than the one considered by Glueck et al. (2008b). Figure 4 illustrates the time needed to calculate one row of such a table corresponding to some when utilizing our proposed algorithms.

As a second example, we consider the computation of from (9). Again, we choose as in the Benjamini-Hochberg case. An asymptotic approximation of this quantity for

 F(t):=Fν,μ(F−1ν,0(t2))−Fν,μ(−F−1ν,0(t2)),

where denotes the distribution function of a non-central chi-squared random variable with degrees of freedom and non-centrality parameter , is given in (Izmirlian, 2018, Table 3). Our results can be used to calculate . Table 3 gives the faithfully rounded values for the -power for the parameters considered in (Izmirlian, 2018, Table 3).

We conclude by giving an example for the exact distribution of the FDP which shows why the FDR is not always an appropriate summary statistic. Consider again the multiple two-sided z-test described in Glueck et al. (2008b), that is given by (10), for and . It is clear that in Figure 5 the distribution of the FDP is neither symmetric about its mean (the FDR, which is depicted as dotted vertical line) nor concentrated around the FDR. A similar argumentation has been used by, among others, Blanchard et al. (2014) and Delattre and Roquain (2015)

in order to motivate the computation of the full distribution of the FDP and to control its quantiles. The latter task is inherently computationally demanding.