When many statistical tests are performed simultaneously, a ubiquitous way to account for the erroneous rejections of the procedure is the false discovery proportion (FDP), that is, the proportion of errors in the rejected sets, as introduced in the seminal paper BenjaminiHochberg95. Most of the related literature studies the expected value of this quantity, which is the false discovery rate (FDR), e.g., building procedures that improve the original Benjamini-Hochberg procedure by trying to adapt to some underlying structure of the data. In particular, a fruitful direction is to take into account the heterogeneous structure of the different tests. Heterogeneity may originate from various sources. The two main examples we have in mind, and which have been intensively investigated in the statistical literature recently, is heterogeneity caused by -value weighting and discrete data.
The -value weighting is a popular approach that can be traced back to Holm1979 and that has been further developed specifically for FDR in, e.g., GRW2006; BR2008EJS; HZZ2010; ZZ2014; Ramdas2017. Here, the heterogeneity can be for instance driven by sample size, groups, or more generally by some covariates. In particular, finding optimal weighting in the sense of maximizing the number of true rejections has been investigated in WR2006; RDV2006; RW2009; Ign2016; Durand2017, either from independent weighting or from the same data set. As a result, the weighted -values have heterogeneous null distribution functions that must be properly taken into account by multiple testing procedures.
On the other hand, multiple testing for discrete distributions is a well identified research field Tar1990; WestWolf1997; Gilbert05 that has received a growing attention in the last decade, see, e.g., Heyse2011; Heller2012; Dickhaus2012; Habiger2015; CDS2015; Doehler2016; CDH2018; DDR2018; DDR2019
and references therein. The most typical setting is the case for which each test is performed according to a contingency table. In that situation, the heterogeneity is induced by the fact that marginal counts naturally vary from one table to another. The approach is then to suitably combine the heterogeneous null distributions to compensate the natural conservativeness of individual discrete tests. Namely, Heyse’s approachHeyse2011 is to consider the transformation
and to apply BH to the transformed -values . Unfortunately, the latter does not rigorously control the FDR, as it has been proven in Doehler2016; DDR2018. Appropriate corrections of the expression have been proposed in DDR2018 in order to recover rigorous FDR control.
1.2. FDX control
A common criticism of FDR is that it captures only the average behavior of the FDP. In particular, controlling the FDR does not prevent the FDP from possessing undesirable fluctuations and we may aim to stochastically control of FDP in other ways. A classical approach is to control the false probability exceedance (FDX) in the following sense: for,
This corresponds to control the
-quantile of the FDP distribution at level, see, e.g., GW2004; PGVW2004; Korn2004; LR2005; GW2006; RW2007; GHS2014; DR2015. Let us also mention that the probabilistic fluctuation of the FDP process is of interest in its own, see, e.g., Neu2008; RV2011; DR2011; DR2015a; DJ2019.
Among multiple testing procedures, step-down procedures have been shown to be particularly useful for FDX control. Two prominent step-down procedures have been proven to control the FDX under different distributional assumptions:
The Lehmann-Romano procedure , introduced in LR2005, is defined as the step-down procedure with critical values
where we denote
It has been shown to control the FDX under various dependence assumptions between the -values, e.g., when each -value under the null is independent of the family of the -values under the alternative (Theorem 3.1 in LR2005), which we will refer to (Indep0) below, or when the Simes inequality holds true among the family of true null -values (Theorem 3.2 in LR2005). 111Under the latter condition, it has also been proven later that the step-up version of , that is, the step-up procedure using the critical values (3) also controls the FDX, see the proof of Theorem 3.1 in GHS2014.
The procedure has been improved by the Guo-Romano procedure , see GR2007, defined as the step-down procedure with critical values
where denotes any variable following a binomial distribution with parameters and . While making more rejections, the procedure controls the FDX under a stronger assumption: the null -value family and the alternative -value family are independent and that the null -values are independent, which we will refer to (Indep) below.
The global aim of the paper is to improve procedures and by incorporating the null distribution functions of the -values while maintaining rigorous FDX control. More specifically, the contributions of this work are as follows:
at the price of additional computational complexity, we introduce the Poisson-binomial procedure , which controls the FDX under (Indep) and is a uniform improvement of and ;
we apply this new technology to weighted -values to provide the first weighted procedures that control the FDX (to our knowledge), called and . They are able to improve their non-weighted counterparts and , respectively, see Section 4;
in the discrete context, our new procedures re-named , are shown to be uniform improvements with respect to the continuous procedures and , respectively. To the best of our knowledge, these are the first FDX controlling procedures tailored specifically to discrete -value distributions. The amplitude of the improvement can be substantial, as we show both with simulated and real data examples, see Section 5.
The paper is organized as follows: Section 2 introduces the statistical setting, the procedures and FDX criterion, as well as a shortcut to compute our step-down procedures without evaluating the critical values. Section 3 is the main section of the paper, which introduces the new heterogeneous procedures and their FDX controlling properties. Our methodology is then applied in two particular frameworks: new weighted procedures controlling the FDX are derived in Section 4 while Section 5 is devoted to the case where the tests are discrete. Both sections include numerical illustrations. A discussion is provided in Section 6 and most of technical details are deferred to Section 7. Appendix A gives additional numerical details for simulations.
We use here a classical formal setting for heterogeneous nulls, see, e.g., DDR2018. We observe , defined on an abstract probabilistic space, valued in an observation space and of distribution that belongs to a set of possible distributions. We consider null hypotheses for , denoted , , and we denote the corresponding set of true null hypotheses by . We also denote by the complement of in and by the number of true nulls.
We assume that there exists a set of
-values that is, a set of random variables, valued in . We introduce the following dependence assumptions between the -values:
|(Indep0)||for all , is independent of ;|
|(Indep)||(Indep0) holds and for all , consists of independent variables.|
. The (maximum) null cumulative distribution function of each-value is denoted
We let that we supposed to be known and we consider the following possible situations for the functions in :
|(Cont)||for all , is continuous on|
The case (Discrete) typically arises when for all and , . Throughout the paper, we will assume that we are either in the case (Cont) or (Discrete) and we denote , with by convention when (Cont) holds. For comparison with the homogeneous case, we also let
2.2. False Discovery Exceedance and step-down procedures
In general, a multiple testing procedure is defined as a random subset which corresponds to the indices of the rejected nulls. For , the false discovery exceedance of is defined as follows:
In this paper, we consider particular multiple testing procedures, called step-down procedures. Given some -value family and some non-decreasing sequence , the step-down procedure with critical values rejects the null hypotheses corresponding to the set
for which denotes the -values ordered increasingly (for some data-dependent permutation ).
2.3. Transformation function family and computational shortcut
In this paper, the critical values will be obtained by inverting some functional, that is,
for , , a given set of functions. In order for (10) to be well-defined and to be nondecreasing, we will say that the function set is a transformation function family if it satisfies the following conditions:
For instance, the critical values of the procedure can be rewritten as (10) for the functions
We easily check that the function set is a family of transformation functions (in the sense of (11)). Indeed, is non-increasing both in and . A second example is given by the procedure for which
can be proved to form a family of transformation functions. Indeed, the only non-obvious argument to prove (11) is that for a fixed , and we have . This comes from the fact that is non-increasing both in and .
Finally, because of the inversion, computing the critical values via (10) can be time consuming. Fortunately, computing the critical values is actually not necessary if we are solely interested in determining the rejection set given by (8). As the following result shows, we may determine by working directly with the transformation functions.
Let us consider any transformation function family and the corresponding critical values , , defined by (10). Then, for all , with -probability , the step-down procedure with critical values can equivalently written as
3. New FDX controlling procedures
In this section, we introduce new procedures that control the false discovery exceedance at some level , that is,
while incorporating the family in an appropriate way.
Our main tool is the following bound: For any step-down procedure with critical values , we have
Inequality (17) is valid under the distributional assumption (Indep0). This bound comes from a reformulation of Theorem 5.2 in Roq2011 in our heterogenous framework, see Theorem 7.1 below. Our new procedures are derived by further upper-bounding via various probabilistic devices. More specifically, we will introduce several transformation function families such that for all ,
3.2. Heterogeneous Lehmann-Romano procedure
By using the Markov inequality, we obtain
where denotes the values of ordered decreasingly. Bounding the above quantity by entails the following procedure.
The heterogeneous Lehmann-Romano procedure, denoted by , is defined as the step-down procedure using the critical values defined by
where denotes the values of ordered decreasingly and is defined by (4).
The quantity is thus similar to , in which has been replaced by the average of the largest values of . To check that the functions form a transformation function family in the sense of (11), we note that is non-increasing in (averaging on smaller values makes the average smaller) and continuous in under (Cont) (because is continuous and is -Lipschitz).
In the classical case (SuperUnif), we have for all and . Hence, is less conservative than in that situation. A technical detail is that this only holds almost surely because the range in (20) can be different from in the case (Discrete). This entails the following result.
3.3. Poisson-binomial procedure
Here, we propose to bound (18) by using the Poisson-binomial distribution. To this end, recall that the Poisson-Binomial distribution of parameters , denoted below, corresponds to the distribution of , where the are all independent and each
follows a Bernoulli distribution of parameterfor .
Bounding the latter by leads to the following procedure.
The Poisson-binomial procedure, denoted by , is defined as the step-down procedure using the critical values
where denotes the values of ordered decreasingly and is defined by (4).
Let us now check that is a transformation function family, that is, satisfy (11). The continuity assumption holds because, under (Cont), the mapping is continuous (argument similar to above) and the cumulative distribution function of is a continuous function of . The monotonic property comes from the fact that is non-increasing both in and .
Since under (SuperUnif), the distribution is stochastically smaller than the distribution , the following holds.
However, in general, the procedure is computationally demanding, even with the shortcut mentioned in Section 2.3. This comes from the computation of which involves the distribution function of a Poisson-binomial variable. In the next section, we make slightly more conservative for recovering the computational price of .
3.4. Heterogeneous Guo-Romano procedure
In this section, we further upper-bound (25) by using that any random variable is stochastically upper-bounded by a random variable (see Example 1.A.25 in Shaked). This yields
where we let
where denotes the values of ordered decreasingly.
This reasoning suggests another heterogeneous procedure, based on the binomial distribution. Since also uses the binomial device, we name this new procedure the heterogeneous Guo-Romano procedure.
The condition (11) also holds in that case. However, the proof of monotonicity of is slightly more involved than above and is deferred to Lemma 7.1. In addition, since under (SuperUnif) we have , we deduce that , although more conservative than , is still a uniform improvement over .
The numerical results in Sections 4 and 5 suggest that the conservatism of with respect to is usually quite small. In addition, since the computational effort required by is comparable to that of , the gain in efficiency may be great, especially for large . We therefore think that may be especially useful for very high dimensional heterogeneous data.
We can also define a non-adaptive version of , defined as the step-down procedure of critical values (10) based on the transformation functional
where . While being more conservative than , it still controls the FDX in the sense (16) . So, while controlling the FDR is linked to the arithmetic average of the ’s (Heyse’s procedure, see text below (1)), this shows that controlling the FDX is linked to geometric averaging. In addition, this shows that the situation is even more simple for FDX, because no further correction is needed here, whereas the arithmetic average should be slightly modified in order to yield a rigorous FDR control DDR2018.
4. Application to weighting
It is well known that -value weighting can improve the power of multiple testing procedures, see, e.g., GRW2006; RW2009; Ign2016; Dur2017; Ramdas2017 and references therein. However, to the best of our our knowledge, except for the augmentation approach described in GRW2006, no methods are available that incorporate weighting for FDX control. We show in this section that such methods can be obtained directly from the bounds on introduced in Section 3.
Throughout this section, we assume that we have at hand a -value family satisfying (Cont) and (SuperUnif). As explained in our introduction section (see references therein), while the null distributions of the -values are typically uniform, the point is that they can have heterogeneous alternative distributions, so that it could be desirable to weigh the -values in some way. For this, we consider a fixed weight vector . The ordered weights are denoted , the average weight is denoted and the average over the largest weights is denoted by .
Since the heterogeneous procedures , and introduced in Section 3 yield valid control for any collection of distribution functions , it is possible to use very flexible weighting schemes. In order to limit the scope of this paper, we consider only two simple types of weighting approaches in more detail:
for arithmetic mean weighting (abbreviated in what follows as AM), we define the weighted -value family as
The weighted -values thus have the heterogeneous distribution functions
under the null. This corresponds to classical weighting approaches established for FWER and FDR control.
for geometric mean weighting (abbreviated as GM), we define
The weighted -values therefore have the following heterogeneous distribution functions under the null:
Thus, combining these two weighting approaches with the three heterogeneous procedures introduced in the previous section yields a total of six weighted procedures which we discuss in more detail below. Note that a Taylor expansion provides for small values of . Therefore, we expect that AM and GM procedures will yield similar rejection sets for small -values.
4.1. Weighted Lehmann-Romano procedures
Since the form a transformation function family, bounding the latter by leads to an FDX controlling procedure, that we call the AM-weighted Lehmann-Romano procedure, denoted by in the sequel. It thus corresponds to the step-down procedure using the weighted -values (29) and the critical values
In particular, if the weight vector is uniform, that is, for all , then reduces to .
Similarly to above, using the GM weighting (31) gives an FDX smaller than or equal to
This gives rise to the GM-weighted Lehmann-Romano procedure, denoted , defined as the step-down procedure using the weighted -values (31) and the critical values
In general, no domination relationship holds between and . Finally, again, in case of uniform weighting, reduces to .
4.2. Weighted Poisson-binomial procedures
Applying the strategy of Section 3.3 with the c.d.f. sets and , we can use the two transformation function families given by
to define new step-down procedures, denoted and respectively, that both ensure FDX control.
4.3. Weighted Guo-Romano procedures
This gives rise to the transformation function families
Critical values and are obtained via (10) from families and , respectively. This yields two new FDX controlling step-down procedures that are denoted by and , respectively. Note that, similar to arithmetic weighting for the procedure, geometric weighting leads to a simple transformation of the original critical values, given by
Thus, this particular procedure combines simplicity with a close relationship to the original Guo-Romano procedure. By contrast, as for the heterogeneous version, the weighted Poisson-binomial procedures require the evaluation of the Poisson-binomial distribution function which may be computationally demanding for large . The weighted Guo-Romano procedures, on the other hand, while possibly sacrificing some power, only require evaluation of the standard binomial distribution.
4.4. Analysis of RNA-Seq data
We revisit an analysis of the RNA-Seq data set ’airway’ using results from the independent hypothesis weighting (IHW) approach (for details, see Ign2016 and the vignette accompanying its software implementation). Loosely speaking, this method aims to increase power by assigning a weight to each hypothesis and subsequently applying e.g. the Bonferroni or the Benjamini-Hochberg procedure to the weighted -values while aiming for control of FWER or FDR.
In what follows, we present some results for weighted FDX control, using the procedures introduced in Sections 4.2 and 4.3. For this data set we have and the weights are taken from the output of the ihw function from the bioconductor package ’IHW’. For the sake of illustration we assume the -values to be independent. A large portion (about ) of these weights are 0, Figure 1 presents a histogram of the (strictly) positive weights.
Table 1 shows that controlling the mean (i.e. FDR) or the median of the FDP leads to similar number of rejections.
For both error rates, incorporating weights leads to similar gains in power. For weighted FDX control, the more conservative weighted Guo-Romano procedures exhibit only a slight loss of power with respect to the weighted Poisson-binomial approaches. The difference between arithmetic and geometric weighting is negligible for this data.
Figure 2 indicates that for the FDX controlling procedures, the mapping of the confidence level to the number of rejections is quite flat. This means that statements about the FDP can be made with high confidence without losing too much power. For instance, requiring that with confidence at least still allows for 4145 and 4771 rejections using the and procedures.
5. Application to discrete tests
5.1. Discrete FDX procedures
Discrete FDX procedures can be defined in a straightforward way by directly using the distribution functions of the discretely distributed -values. The prototypical example we have in mind are multiple conditional tests like Fisher’s exact test. In this case, discreteness and heterogeneity arise from conditioning on the observed table margins. We denote the resulting heterogeneous procedures from section 3 by (for ), (for [HPB]) and (for ).
5.2. Simulation study
We now investigate the power of the , and procedures in a simulation study similar to those described in (Gilbert05), (Heller2012) and (DDR2018). We focus on comparing the performance of the new discrete procedures to their continuous counterparts. Since the analysis with is computationally demanding, we are also interested in investigating the performance of the slightly more conservative, but numerically more efficient procedure. Finally, as above, we also include (Benjamini-Hochberg procedure) as a benchmark.
5.2.1. Simulated Scenarios
We simulate a two-sample problem in which a vector of independent binary responses (“adverse events”) is observed for each subject in two groups, where each group consists of subjects. Then, the goal is to simultaneously test the null hypotheses “”, , where and are the success probabilities for the th binary response in group 1 and 2, respectively. Before we describe the simulation framework in more detail, we explain how this set-up leads to discrete and heterogeneous -value distributions. Suppose we have simulated two vectors of dimension where each component represents a count in . This data can be represented by contingency tables. Now each hypothesis is tested using Fisher’s exact test (two-sided) for each contingency table, which is performed by conditioning on the (simulated) pair of marginal counts. Thus, we can determine for every contingency table the discrete distribution function of the
-values for Fisher’s exact test under the null hypothesis. For differing (simulated) contingency tables, these induced distributions will generally be heterogeneous and our inference is conditionally on the marginal counts.
We take where and data are generated so that the response is at positions for both groups, at positions for both groups and at positions for group 1 and at positions for group 2 where represents weak, moderate and strong effects, respectively. The null hypothesis is true for the and positions while the alternative hypothesis is true for the positions. We also take different configurations for the proportion of false null hypotheses, is set to be , and of the value of , which represents small, intermediate and large proportion of effects, respectively (the proportion of true nulls is , , , respectively). Then, is set to be , and of the number of true nulls (that is, ) and is taken accordingly as .
For each of the 54 possible parameter configurations specified by and , Monte Carlo trials are performed, that is, data sets are generated and for each data set, an unadjusted two-sided -value from Fisher’s exact test is computed for each of the positions, and the multiple testing procedures mentioned above are applied at level
. The power of each procedure was estimated as the fraction of thefalse null hypotheses that were rejected, averaged over the simulations (TDP, true discovery proportion). Note that while our procedures are designed to control the FDP conditionally on the marginal counts, our power results are presented in an unconditional way for the sake of simplicity. For random number generation the R-function rbinom was used. The two-sided -values from Fisher’s exact test were computed using the R-function fisher.test.
Table 3 in Appendix A shows that the (average) power of the compared procedures depends primarily on the strength of the signal . More specifically, Figure 3 contains some typical plots of the simulation results.
For , the power of , and is practically zero, whereas the discrete procedures are able to reject at least a few hypotheses, see panel (a) of Figure 3.
For , the power of and stays close to zero, performs slighty better and the discrete variants perform best as illustrated in panel (b) of Figure 3.
There is no relevant difference in power between the procedures and .
In summary, these results show that for and , significant improvements are possible by using discreteness.
5.3. Analysis of pharmacovigilance data
We revisit the analysis of pharmacovigilance data from Heller2012 presented in DDR2018. This data set is obtained from a database for reporting, investigating and monitoring adverse drug reactions due to the Medicines and Healthcare products Regulatory Agency in the United Kingdom. It contains the number of reported cases of amnesia as well as the total number of adverse events reported for each of the drugs in the database. For a more detailed description of the data which is contained in the R-packages discreteMTP and DJDR2018 we refer to Heller2012. The works Heller2012 and DDR2018 investigate the association between reports of amnesia and suspected drugs by performing for each drug a Fisher’s exact test (one-sided) for testing association between the drug and amnesia while adjusting for multiplicity by using several (discrete) FDR procedures. Applying the Benjamini-Hochberg procedure to this data set yields candidate drugs which could be associated with amnesia. Using the discrete FDR controlling procedures from DDR2018 yields candidate drugs.
In what follows, we investigate the performance of the , , , and procedures for analyzing this data set. First, we compare these procedures when the goal is control of the median FDX instead of FDR at the level, i.e., we require . Figure 4