# Controlling false discovery exceedance for heterogeneous tests

Several classical methods exist for controlling the false discovery exceedance (FDX) for large scale multiple testing problems, among them the Lehmann-Romano procedure ([LR] below) and the Guo-Romano procedure ([GR] below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the p-values F_i, 1≤ i≤ m, are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity usually induces a conservativeness for the aforementioned procedures. In this paper, we develop three new procedures that incorporate the F_i's, while ensuring the FDX control. The heterogeneous version of [LR], denoted [HLR], is based on the arithmetic average of the F_i's, while the heterogeneous version of [GR], denoted [HGR], is based on the geometric average of the F_i's. We also introduce a procedure [PB], that is based on the Poisson-binomial distribution and that uniformly improves [HLR] and [HGR], at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the F_i's can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the p-values are weighted by some known independent weight vector, e.g., coming from co-data sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts.

## Authors

• 4 publications
• 14 publications
02/28/2020

### A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests

This paper is concerned with false discovery rate (FDR) control in large...
08/23/2020

### False discovery rate envelope for functional test statistics

False discovery rate (FDR) is a common way to control the number of fals...
07/11/2018

### Differentially Private False Discovery Rate Control

Differential privacy provides a rigorous framework for privacy-preservin...
06/04/2019

### On Benjamini-Hochberg procedure applied to mid p-values

Multiple testing with discrete p-values routinely arises in various scie...
10/04/2021

### Online multiple testing with super-uniformity reward

Valid online inference is an important problem in contemporary multiple ...
05/11/2018

### False Discovery Rate Control Under Reduced Precision Computation

The mitigation of false positives is an important issue when conducting ...
08/28/2021

### ZAP: Z-value Adaptive Procedures for False Discovery Rate Control with Side Information

Adaptive multiple testing with covariates is an important research direc...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

### 1.1. Background

When many statistical tests are performed simultaneously, a ubiquitous way to account for the erroneous rejections of the procedure is the false discovery proportion (FDP), that is, the proportion of errors in the rejected sets, as introduced in the seminal paper BenjaminiHochberg95. Most of the related literature studies the expected value of this quantity, which is the false discovery rate (FDR), e.g., building procedures that improve the original Benjamini-Hochberg procedure by trying to adapt to some underlying structure of the data. In particular, a fruitful direction is to take into account the heterogeneous structure of the different tests. Heterogeneity may originate from various sources. The two main examples we have in mind, and which have been intensively investigated in the statistical literature recently, is heterogeneity caused by -value weighting and discrete data.

The -value weighting is a popular approach that can be traced back to Holm1979 and that has been further developed specifically for FDR in, e.g., GRW2006; BR2008EJS; HZZ2010; ZZ2014; Ramdas2017. Here, the heterogeneity can be for instance driven by sample size, groups, or more generally by some covariates. In particular, finding optimal weighting in the sense of maximizing the number of true rejections has been investigated in WR2006; RDV2006; RW2009; Ign2016; Durand2017, either from independent weighting or from the same data set. As a result, the weighted -values have heterogeneous null distribution functions that must be properly taken into account by multiple testing procedures.

On the other hand, multiple testing for discrete distributions is a well identified research field Tar1990; WestWolf1997; Gilbert05 that has received a growing attention in the last decade, see, e.g., Heyse2011; Heller2012; Dickhaus2012; Habiger2015; CDS2015; Doehler2016; CDH2018; DDR2018; DDR2019

and references therein. The most typical setting is the case for which each test is performed according to a contingency table. In that situation, the heterogeneity is induced by the fact that marginal counts naturally vary from one table to another. The approach is then to suitably combine the heterogeneous null distributions to compensate the natural conservativeness of individual discrete tests. Namely, Heyse’s approach

Heyse2011 is to consider the transformation

 (1) ¯¯¯¯F(t)=m−1m∑i=1Fi(t),t∈[0,1],

and to apply BH to the transformed -values . Unfortunately, the latter does not rigorously control the FDR, as it has been proven in Doehler2016; DDR2018. Appropriate corrections of the expression have been proposed in DDR2018 in order to recover rigorous FDR control.

### 1.2. FDX control

A common criticism of FDR is that it captures only the average behavior of the FDP. In particular, controlling the FDR does not prevent the FDP from possessing undesirable fluctuations and we may aim to stochastically control of FDP in other ways. A classical approach is to control the false probability exceedance (FDX) in the following sense: for

,

 (2) FDX=P(FDP>α)≤ζ.

This corresponds to control the

-quantile of the FDP distribution at level

, see, e.g., GW2004; PGVW2004; Korn2004; LR2005; GW2006; RW2007; GHS2014; DR2015. Let us also mention that the probabilistic fluctuation of the FDP process is of interest in its own, see, e.g., Neu2008; RV2011; DR2011; DR2015a; DJ2019.

Among multiple testing procedures, step-down procedures have been shown to be particularly useful for FDX control. Two prominent step-down procedures have been proven to control the FDX under different distributional assumptions:

• The Lehmann-Romano procedure , introduced in LR2005, is defined as the step-down procedure with critical values

 (3) τ{\tiny LR}ℓ=ζ⌊αℓ⌋+1m(ℓ),1≤ℓ≤m,

where we denote

 (4) m(ℓ)=m−ℓ+⌊αℓ⌋+1.

It has been shown to control the FDX under various dependence assumptions between the -values, e.g., when each -value under the null is independent of the family of the -values under the alternative (Theorem 3.1 in LR2005), which we will refer to (Indep0) below, or when the Simes inequality holds true among the family of true null -values (Theorem 3.2 in LR2005). 111Under the latter condition, it has also been proven later that the step-up version of , that is, the step-up procedure using the critical values (3) also controls the FDX, see the proof of Theorem 3.1 in GHS2014.

• The procedure has been improved by the Guo-Romano procedure , see GR2007, defined as the step-down procedure with critical values

 (5) τ{\tiny GR}ℓ =max{t∈[0,1]:P(Bin[m(ℓ),t]≥⌊αℓ⌋+1)≤ζ},1≤ℓ≤m,

where denotes any variable following a binomial distribution with parameters and . While making more rejections, the procedure controls the FDX under a stronger assumption: the null -value family and the alternative -value family are independent and that the null -values are independent, which we will refer to (Indep) below.

### 1.3. Contributions

The global aim of the paper is to improve procedures and by incorporating the null distribution functions of the -values while maintaining rigorous FDX control. More specifically, the contributions of this work are as follows:

• we introduce the heterogeneous Lehmann Romano procedure , which controls the FDX under (Indep0) and is an uniform improvement of (when the marginals of the null -values are super-uniform, see (SuperUnif) further on);

• we introduce the heterogeneous Guo Romano procedure , which controls the FDX under (Indep) and is an uniform improvement of (under (SuperUnif));

• at the price of additional computational complexity, we introduce the Poisson-binomial procedure , which controls the FDX under (Indep) and is a uniform improvement of and ;

• we apply this new technology to weighted -values to provide the first weighted procedures that control the FDX (to our knowledge), called and . They are able to improve their non-weighted counterparts and , respectively, see Section 4;

• in the discrete context, our new procedures re-named , are shown to be uniform improvements with respect to the continuous procedures and , respectively. To the best of our knowledge, these are the first FDX controlling procedures tailored specifically to discrete -value distributions. The amplitude of the improvement can be substantial, as we show both with simulated and real data examples, see Section 5.

The paper is organized as follows: Section 2 introduces the statistical setting, the procedures and FDX criterion, as well as a shortcut to compute our step-down procedures without evaluating the critical values. Section 3 is the main section of the paper, which introduces the new heterogeneous procedures and their FDX controlling properties. Our methodology is then applied in two particular frameworks: new weighted procedures controlling the FDX are derived in Section 4 while Section 5 is devoted to the case where the tests are discrete. Both sections include numerical illustrations. A discussion is provided in Section 6 and most of technical details are deferred to Section 7. Appendix A gives additional numerical details for simulations.

## 2. Framework

### 2.1. Setting

We use here a classical formal setting for heterogeneous nulls, see, e.g., DDR2018. We observe , defined on an abstract probabilistic space, valued in an observation space and of distribution that belongs to a set of possible distributions. We consider null hypotheses for , denoted , , and we denote the corresponding set of true null hypotheses by . We also denote by the complement of in and by the number of true nulls.

We assume that there exists a set of

-values that is, a set of random variables

, valued in . We introduce the following dependence assumptions between the -values:

 (Indep0) for all P∈P, {pi(X),i∈H0(P)} is independent of {pi(X),i∈H1(P)}; (Indep) (Indep0) holds and for all P∈P, {pi(X),i∈H0(P)} consists of independent variables.

Note that (Indep0) and (Indep) are both satisfied when all the -values , , are mutually independent in the model

. The (maximum) null cumulative distribution function of each

-value is denoted

 (6) Fi(t)=supP∈P:i∈H0(P){PX∼P(pi(X)≤t)},t∈[0,1],1≤i≤m.

We let that we supposed to be known and we consider the following possible situations for the functions in :

 (Cont) for all i∈{1,…,m}, Fi is continuous on [0,1] (Discrete) for all i∈{1,…,m}, there exists% some countable set Ai⊂[0,1] such thatFi is a step function, right continuous, that jumps only at some % points of Ai.

The case (Discrete) typically arises when for all and , . Throughout the paper, we will assume that we are either in the case (Cont) or (Discrete) and we denote , with by convention when (Cont) holds. For comparison with the homogeneous case, we also let

 (SuperUnif) for all i∈{1,…,m}, Fi(t)≤t for all t∈[0,1].

### 2.2. False Discovery Exceedance and step-down procedures

In general, a multiple testing procedure is defined as a random subset which corresponds to the indices of the rejected nulls. For , the false discovery exceedance of is defined as follows:

 (7) FDXα(R,P)=PX∼P(|R(X)∩H0(P)||R(X)|∨1>α),P∈P.

In this paper, we consider particular multiple testing procedures, called step-down procedures. Given some -value family and some non-decreasing sequence , the step-down procedure with critical values rejects the null hypotheses corresponding to the set

 (8) R ={i∈{1,…,m}:pi(X)≤τ^ℓ} (9) ˆℓ =max{ℓ∈{0,…,m}:∀ℓ′≤ℓ,pσ(ℓ′)≤τℓ′}, (convention pσ(0)=0),

for which denotes the -values ordered increasingly (for some data-dependent permutation ).

### 2.3. Transformation function family and computational shortcut

In this paper, the critical values will be obtained by inverting some functional, that is,

 (10) τℓ=ξ−1ℓ(ζ)=max{t∈A:ξℓ(t)≤ζ},(τℓ=0 if the set is empty),1≤ℓ≤m,

for , , a given set of functions. In order for (10) to be well-defined and to be nondecreasing, we will say that the function set is a transformation function family if it satisfies the following conditions:

 (11) for all ℓ∈{1,…,m}, ξℓ is a non-decreasing function;for all t∈[0,1] and all ℓ∈{1,…,m−1}, we have ξℓ+1(t)≤ξℓ(t);in case (???), for all ℓ∈{1,…,m}, ξℓ is % continuous on [0,1].

For instance, the critical values of the procedure can be rewritten as (10) for the functions

 (12) ξ{\tiny LR}ℓ(t)=m(ℓ)⌊αℓ⌋+1t,t∈[0,1],1≤ℓ≤m.

We easily check that the function set is a family of transformation functions (in the sense of (11)). Indeed, is non-increasing both in and . A second example is given by the procedure for which

 (13) ξ{\tiny GR}ℓ(t) =P(Bin[m(ℓ),t]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1],

can be proved to form a family of transformation functions. Indeed, the only non-obvious argument to prove (11) is that for a fixed , and we have . This comes from the fact that is non-increasing both in and .

Finally, because of the inversion, computing the critical values via (10) can be time consuming. Fortunately, computing the critical values is actually not necessary if we are solely interested in determining the rejection set given by (8). As the following result shows, we may determine by working directly with the transformation functions.

###### Proposition 2.1.

Let us consider any transformation function family and the corresponding critical values , , defined by (10). Then, for all , with -probability , the step-down procedure with critical values can equivalently written as

 (14) R ={i∈{1,…,m}:~pi≤ζ}; (15) ~pi =max1≤ℓ≤mpσ(ℓ)≤pi{ξℓ(pσ(ℓ))},1≤i≤m.

Proposition 2.1 is proved in Section 7.2.

## 3. New FDX controlling procedures

In this section, we introduce new procedures that control the false discovery exceedance at some level , that is,

 (16) for all P∈P, FDXα(R,P)≤ζ,

while incorporating the family in an appropriate way.

### 3.1. Tool

Our main tool is the following bound: For any step-down procedure with critical values , we have

 (17) supP∈P{FDXα(R,P)} ≤B(τ,α) (18) for B(τ,α) =sup1≤ℓ≤msupP∈P|H0(P)|≤m(ℓ)PX∼P⎛⎝∑i∈H0(P)1{pi(X)≤τℓ}≥⌊αℓ⌋+1⎞⎠.

Inequality (17) is valid under the distributional assumption (Indep0). This bound comes from a reformulation of Theorem 5.2 in Roq2011 in our heterogenous framework, see Theorem 7.1 below. Our new procedures are derived by further upper-bounding via various probabilistic devices. More specifically, we will introduce several transformation function families such that for all ,

 B(τ,α)≤sup1≤ℓ≤m{ξℓ(τℓ)}.

According to (17), the step-down procedure using the corresponding critical values (10) will then control the FDX in the sense of (16).

### 3.2. Heterogeneous Lehmann-Romano procedure

By using the Markov inequality, we obtain

 (19) B(τ,α) ≤sup1≤ℓ≤msupP∈P|H0(P)|≤m(ℓ)∑i∈H0(P)Fi(τℓ)⌊αℓ⌋+1=sup1≤ℓ≤m∑m(ℓ)j=1(F(τℓ))(j)⌊αℓ⌋+1,

where denotes the values of ordered decreasingly. Bounding the above quantity by entails the following procedure.

###### Definition 3.1.

The heterogeneous Lehmann-Romano procedure, denoted by , is defined as the step-down procedure using the critical values defined by

 (20) τ{\tiny HLR}ℓ =max{t∈A:ξ{\tiny HLR}ℓ(t)≤ζ},1≤ℓ≤m; (21) ξ{\tiny HLR}ℓ(t) =∑m(ℓ)j=1(F(t))(j)⌊αℓ⌋+1,1≤ℓ≤m,t∈[0,1],

where denotes the values of ordered decreasingly and is defined by (4).

The quantity is thus similar to , in which has been replaced by the average of the largest values of . To check that the functions form a transformation function family in the sense of (11), we note that is non-increasing in (averaging on smaller values makes the average smaller) and continuous in under (Cont) (because is continuous and is -Lipschitz).

In the classical case (SuperUnif), we have for all and . Hence, is less conservative than in that situation. A technical detail is that this only holds almost surely because the range in (20) can be different from in the case (Discrete). This entails the following result.

###### Proposition 3.1.

In the setting defined in Section 2.1, the procedure satisfies the following

• Under (Indep0), controls the FDX in the sense (16);

• Under (SuperUnif), the set of nulls rejected by contains the one of with -probability , for all .

### 3.3. Poisson-binomial procedure

Here, we propose to bound (18) by using the Poisson-binomial distribution. To this end, recall that the Poisson-Binomial distribution of parameters , denoted below, corresponds to the distribution of , where the are all independent and each

follows a Bernoulli distribution of parameter

for .

First note that for all and , we have that is stochastically upper bounded by a Bernoulli variable of parameter , see (6). As a consequence, by assuming (Indep), we have for all critical values ,

 B(τ,α) ≤sup1≤ℓ≤msupA⊂{1,…,m}|A|≤m(ℓ)P(PBin[(Fi(τℓ))i∈A]≥⌊αℓ⌋+1) (22) =sup1≤ℓ≤mP(PBin[((F(τℓ))(j))1≤j≤m(ℓ)]≥⌊αℓ⌋+1).

Bounding the latter by leads to the following procedure.

###### Definition 3.2.

The Poisson-binomial procedure, denoted by , is defined as the step-down procedure using the critical values

 (23) τ{\tiny PB}ℓ =max{t∈A:ξ{\tiny PB}ℓ(t)≤ζ},1≤ℓ≤m; (24) ξ{\tiny PB}ℓ(t) =P(PBin[((F(t))(j))1≤j≤m(ℓ)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1],

where denotes the values of ordered decreasingly and is defined by (4).

Let us now check that is a transformation function family, that is, satisfy (11). The continuity assumption holds because, under (Cont), the mapping is continuous (argument similar to above) and the cumulative distribution function of is a continuous function of . The monotonic property comes from the fact that is non-increasing both in and .

Since under (SuperUnif), the distribution is stochastically smaller than the distribution , the following holds.

###### Proposition 3.2.

In the setting defined in Section 2.1, the procedure satisfies the following

• Under (Indep), controls the FDX in the sense (16);

• Under (SuperUnif), the set of nulls rejected by contains the one of with -probability , for all .

However, in general, the procedure is computationally demanding, even with the shortcut mentioned in Section 2.3. This comes from the computation of which involves the distribution function of a Poisson-binomial variable. In the next section, we make slightly more conservative for recovering the computational price of .

### 3.4. Heterogeneous Guo-Romano procedure

In this section, we further upper-bound (25) by using that any random variable is stochastically upper-bounded by a random variable (see Example 1.A.25 in Shaked). This yields

 (25) B(τ,α) ≤sup1≤ℓ≤mP(Bin[m(ℓ),~Fm(ℓ)(τℓ)]≥⌊αℓ⌋+1),

where we let

 (26) ~Fj(t)=1−⎛⎝j∏j′=1(1−(F(t))(j′))⎞⎠1/j,1≤j≤m,t∈[0,1],

where denotes the values of ordered decreasingly.

This reasoning suggests another heterogeneous procedure, based on the binomial distribution. Since also uses the binomial device, we name this new procedure the heterogeneous Guo-Romano procedure.

###### Definition 3.3.

The heterogeneous Guo-Romano procedure, denoted by , is defined as the step-down procedure using the critical values defined by

 (27) τ{\tiny HGR}ℓ =max{t∈A:ξ{\tiny HGR}ℓ(t)≤ζ},1≤ℓ≤m; (28) ξ{\tiny HGR}ℓ(t) =P(Bin[m(ℓ),~Fm(ℓ)(t)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1],

where is defined in (26) and is defined by (4).

The condition (11) also holds in that case. However, the proof of monotonicity of is slightly more involved than above and is deferred to Lemma 7.1. In addition, since under (SuperUnif) we have , we deduce that , although more conservative than , is still a uniform improvement over .

###### Proposition 3.3.

In the setting defined in Section 2.1, the procedure satisfies the following

• Under (Indep), controls the FDX in the sense (16);

• Under (SuperUnif), the set of nulls rejected by contains the one of with -probability , for all .

###### Remark 3.1.

The numerical results in Sections 4 and 5 suggest that the conservatism of with respect to is usually quite small. In addition, since the computational effort required by is comparable to that of , the gain in efficiency may be great, especially for large . We therefore think that may be especially useful for very high dimensional heterogeneous data.

###### Remark 3.2.

We can also define a non-adaptive version of , defined as the step-down procedure of critical values (10) based on the transformation functional

 ξℓ(t) =P(Bin[m,~F(t)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1],

where . While being more conservative than , it still controls the FDX in the sense (16) . So, while controlling the FDR is linked to the arithmetic average of the ’s (Heyse’s procedure, see text below (1)), this shows that controlling the FDX is linked to geometric averaging. In addition, this shows that the situation is even more simple for FDX, because no further correction is needed here, whereas the arithmetic average should be slightly modified in order to yield a rigorous FDR control DDR2018.

## 4. Application to weighting

It is well known that -value weighting can improve the power of multiple testing procedures, see, e.g., GRW2006; RW2009; Ign2016; Dur2017; Ramdas2017 and references therein. However, to the best of our our knowledge, except for the augmentation approach described in GRW2006, no methods are available that incorporate weighting for FDX control. We show in this section that such methods can be obtained directly from the bounds on introduced in Section 3.

Throughout this section, we assume that we have at hand a -value family satisfying (Cont) and (SuperUnif). As explained in our introduction section (see references therein), while the null distributions of the -values are typically uniform, the point is that they can have heterogeneous alternative distributions, so that it could be desirable to weigh the -values in some way. For this, we consider a fixed weight vector . The ordered weights are denoted , the average weight is denoted and the average over the largest weights is denoted by .

Since the heterogeneous procedures , and introduced in Section 3 yield valid control for any collection of distribution functions , it is possible to use very flexible weighting schemes. In order to limit the scope of this paper, we consider only two simple types of weighting approaches in more detail:

• for arithmetic mean weighting (abbreviated in what follows as AM), we define the weighted -value family as

 (29) pwi=pi¯w/wi,1≤i≤m.

The weighted -values thus have the heterogeneous distribution functions

 (30) FAMi(t) =(wi¯wt)∧1,1≤i≤m,

under the null. This corresponds to classical weighting approaches established for FWER and FDR control.

• for geometric mean weighting (abbreviated as GM), we define

 (31) pwi=1−(1−pi)¯w/wi,1≤i≤m.

The weighted -values therefore have the following heterogeneous distribution functions under the null:

 (32) FGMi(t) =1−(1−t)wi/¯w,1≤i≤m.

Thus, combining these two weighting approaches with the three heterogeneous procedures introduced in the previous section yields a total of six weighted procedures which we discuss in more detail below. Note that a Taylor expansion provides for small values of . Therefore, we expect that AM and GM procedures will yield similar rejection sets for small -values.

### 4.1. Weighted Lehmann-Romano procedures

Using (17) and (19) (Markov device), we get that any step-down procedure using the weighted -values (29) and critical values has a FDX smaller than or equal to

 1⌊αℓ⌋+1m(ℓ)∑j=1(FAM(t))(j)=1⌊αℓ⌋+1m(ℓ)∑j=1((w(j)¯wt)∧1)≤m(ℓ)⌊αℓ⌋+1×¯¯¯¯wm(ℓ)¯wt=:ξ{\tiny wLR-AM}ℓ(t).

Since the form a transformation function family, bounding the latter by leads to an FDX controlling procedure, that we call the AM-weighted Lehmann-Romano procedure, denoted by in the sequel. It thus corresponds to the step-down procedure using the weighted -values (29) and the critical values

 τ{\tiny wLR-AM}ℓ =ζ⌊αℓ⌋+1∑m(ℓ)j=1w(j)¯w=τ{\tiny LR}ℓ⋅¯w¯¯¯¯wm(ℓ),1≤ℓ≤m.

In particular, if the weight vector is uniform, that is, for all , then reduces to .

Similarly to above, using the GM weighting (31) gives an FDX smaller than or equal to

 1⌊αℓ⌋+1m(ℓ)∑j=1(FGM(t))(j)=1⌊αℓ⌋+1m(ℓ)∑j=1(1−(1−t)w(j)/¯w)=:ξ{\tiny wLR-GM}ℓ(t).

This gives rise to the GM-weighted Lehmann-Romano procedure, denoted , defined as the step-down procedure using the weighted -values (31) and the critical values

 τ{\tiny wLR-GM}ℓ =max{t∈[0,1]:ξ{\tiny wLR-GM}ℓ(t)≤ζ},1≤ℓ≤m.

In general, no domination relationship holds between and . Finally, again, in case of uniform weighting, reduces to .

### 4.2. Weighted Poisson-binomial procedures

Applying the strategy of Section 3.3 with the c.d.f. sets and , we can use the two transformation function families given by

 ξ{\tiny wPB-AM}ℓ(t) =P(PBin[((w(j)¯wt)∧1)1≤j≤m(ℓ)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1]; ξ{\tiny wPB-GM}ℓ(t) =P(PBin[(1−(1−t)w(j)/¯w)1≤j≤m(ℓ)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1],

to define new step-down procedures, denoted and respectively, that both ensure FDX control.

### 4.3. Weighted Guo-Romano procedures

We apply here the strategy of Section 3.4 for the c.d.f. sets and . According to (26), let us define

 ~FAMj(t) =1−⎛⎝j∏j′=1((1−(w(j′)¯wt)∧1)⎞⎠1/j,1≤j≤m,t∈[0,1]; ~FGMj(t) =1−⎛⎝j∏j′=1(1−t)w(j′)/¯w⎞⎠1/j=1−(1−t)¯¯¯wj/¯w,1≤j≤m,t∈[0,1].

This gives rise to the transformation function families

 ξ{\tiny wGR-AM}ℓ(t) =P(Bin[m(ℓ),~F%AMm(ℓ)(t)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1]; ξ{\tiny wGR-GM}ℓ(t) =P(Bin[m(ℓ),~F%GMm(ℓ)(t)]≥⌊αℓ⌋+1),1≤ℓ≤m,t∈[0,1].

Critical values and are obtained via (10) from families and , respectively. This yields two new FDX controlling step-down procedures that are denoted by and , respectively. Note that, similar to arithmetic weighting for the procedure, geometric weighting leads to a simple transformation of the original critical values, given by

 (33) τ{\tiny wGR-GM}ℓ =1−(1−τ{\tiny GR}ℓ)¯w/¯¯¯wm(ℓ).

Thus, this particular procedure combines simplicity with a close relationship to the original Guo-Romano procedure. By contrast, as for the heterogeneous version, the weighted Poisson-binomial procedures require the evaluation of the Poisson-binomial distribution function which may be computationally demanding for large . The weighted Guo-Romano procedures, on the other hand, while possibly sacrificing some power, only require evaluation of the standard binomial distribution.

### 4.4. Analysis of RNA-Seq data

We revisit an analysis of the RNA-Seq data set ’airway’ using results from the independent hypothesis weighting (IHW) approach (for details, see Ign2016 and the vignette accompanying its software implementation). Loosely speaking, this method aims to increase power by assigning a weight to each hypothesis and subsequently applying e.g. the Bonferroni or the Benjamini-Hochberg procedure to the weighted -values while aiming for control of FWER or FDR.

In what follows, we present some results for weighted FDX control, using the procedures introduced in Sections 4.2 and 4.3. For this data set we have and the weights are taken from the output of the ihw function from the bioconductor package ’IHW’. For the sake of illustration we assume the -values to be independent. A large portion (about ) of these weights are 0, Figure 1 presents a histogram of the (strictly) positive weights.

Table 1 shows that controlling the mean (i.e. FDR) or the median of the FDP leads to similar number of rejections.

For both error rates, incorporating weights leads to similar gains in power. For weighted FDX control, the more conservative weighted Guo-Romano procedures exhibit only a slight loss of power with respect to the weighted Poisson-binomial approaches. The difference between arithmetic and geometric weighting is negligible for this data.

Figure 2 indicates that for the FDX controlling procedures, the mapping of the confidence level to the number of rejections is quite flat. This means that statements about the FDP can be made with high confidence without losing too much power. For instance, requiring that with confidence at least still allows for 4145 and 4771 rejections using the and procedures.

## 5. Application to discrete tests

### 5.1. Discrete FDX procedures

Discrete FDX procedures can be defined in a straightforward way by directly using the distribution functions of the discretely distributed -values. The prototypical example we have in mind are multiple conditional tests like Fisher’s exact test. In this case, discreteness and heterogeneity arise from conditioning on the observed table margins. We denote the resulting heterogeneous procedures from section 3 by (for ), (for [HPB]) and (for ).

### 5.2. Simulation study

We now investigate the power of the , and procedures in a simulation study similar to those described in (Gilbert05), (Heller2012) and (DDR2018). We focus on comparing the performance of the new discrete procedures to their continuous counterparts. Since the analysis with is computationally demanding, we are also interested in investigating the performance of the slightly more conservative, but numerically more efficient procedure. Finally, as above, we also include (Benjamini-Hochberg procedure) as a benchmark.

#### 5.2.1. Simulated Scenarios

We simulate a two-sample problem in which a vector of independent binary responses (“adverse events”) is observed for each subject in two groups, where each group consists of subjects. Then, the goal is to simultaneously test the null hypotheses ”, , where and are the success probabilities for the th binary response in group 1 and 2, respectively. Before we describe the simulation framework in more detail, we explain how this set-up leads to discrete and heterogeneous -value distributions. Suppose we have simulated two vectors of dimension where each component represents a count in . This data can be represented by contingency tables. Now each hypothesis is tested using Fisher’s exact test (two-sided) for each contingency table, which is performed by conditioning on the (simulated) pair of marginal counts. Thus, we can determine for every contingency table the discrete distribution function of the

-values for Fisher’s exact test under the null hypothesis. For differing (simulated) contingency tables, these induced distributions will generally be heterogeneous and our inference is conditionally on the marginal counts.

We take where and data are generated so that the response is at positions for both groups, at positions for both groups and at positions for group 1 and at positions for group 2 where represents weak, moderate and strong effects, respectively. The null hypothesis is true for the and positions while the alternative hypothesis is true for the positions. We also take different configurations for the proportion of false null hypotheses, is set to be , and of the value of , which represents small, intermediate and large proportion of effects, respectively (the proportion of true nulls is , , , respectively). Then, is set to be , and of the number of true nulls (that is, ) and is taken accordingly as .

For each of the 54 possible parameter configurations specified by and , Monte Carlo trials are performed, that is, data sets are generated and for each data set, an unadjusted two-sided -value from Fisher’s exact test is computed for each of the positions, and the multiple testing procedures mentioned above are applied at level

. The power of each procedure was estimated as the fraction of the

false null hypotheses that were rejected, averaged over the simulations (TDP, true discovery proportion). Note that while our procedures are designed to control the FDP conditionally on the marginal counts, our power results are presented in an unconditional way for the sake of simplicity. For random number generation the R-function rbinom was used. The two-sided -values from Fisher’s exact test were computed using the R-function fisher.test.

#### 5.2.2. Results

Table 3 in Appendix A shows that the (average) power of the compared procedures depends primarily on the strength of the signal . More specifically, Figure 3 contains some typical plots of the simulation results.

• For , the power of , and is practically zero, whereas the discrete procedures are able to reject at least a few hypotheses, see panel (a) of Figure 3.

• For , the power of and stays close to zero, performs slighty better and the discrete variants perform best as illustrated in panel (b) of Figure 3.

• For , the power of stays close to zero, while now rejects a significant amount of hypotheses. The and procedures perform best. If there is a large amount of alternatives, performs better than (see panel (c) of Figure 3) in the other cases, is outperformed by (see panel (d) of Figure 3).

• There is no relevant difference in power between the procedures and .

In summary, these results show that for and , significant improvements are possible by using discreteness.

### 5.3. Analysis of pharmacovigilance data

We revisit the analysis of pharmacovigilance data from Heller2012 presented in DDR2018. This data set is obtained from a database for reporting, investigating and monitoring adverse drug reactions due to the Medicines and Healthcare products Regulatory Agency in the United Kingdom. It contains the number of reported cases of amnesia as well as the total number of adverse events reported for each of the drugs in the database. For a more detailed description of the data which is contained in the R-packages discreteMTP and DJDR2018 we refer to Heller2012. The works Heller2012 and DDR2018 investigate the association between reports of amnesia and suspected drugs by performing for each drug a Fisher’s exact test (one-sided) for testing association between the drug and amnesia while adjusting for multiplicity by using several (discrete) FDR procedures. Applying the Benjamini-Hochberg procedure to this data set yields candidate drugs which could be associated with amnesia. Using the discrete FDR controlling procedures from DDR2018 yields candidate drugs.

In what follows, we investigate the performance of the , , , and procedures for analyzing this data set. First, we compare these procedures when the goal is control of the median FDX instead of FDR at the level, i.e., we require . Figure 4