1 Introduction and motivation
In genomics, but also in various other fields, several tests are applied simultaneously. Throughout, let denote a family of hypotheses for big data experiments with associated -values
and vector. If nothing else is said, let
be uniformly distributed onfor each . Here, is the set of indices belonging to true hypotheses, i.e., iff is true. Analogously, introduce the index set of false hypotheses. Since the type error increases naturally when grows, corrections for multiple testing are needed. The classical Bonferroni correction, where the test size is divided by the number of tests, is too conservative, especially for large . Benjamini_Hochberg_1995 promoted to use the false discovery rate (FDR) as decision criteria for multiple testing procedures. The FDR is defined by
where is the number of false rejections and equals the number of all rejections. The step-up (SU) multiple testing procedure of Benjamini_Hochberg_1995, in short denoted by BH procedure, is based on the linear critical values and rejects all hypotheses with
and denote the order statistics. The FDR was first established under the following assumption:
Basic independence assumption (BI): The uniformly distributed -values corresponding to true hypotheses are mutually independent as well as independent of the -values belonging to false hypotheses.
At least under BI, the BH test controls the FDR for a pre-specified level . In fact, we have, see [Benjamini_Hochberg_1995, FinnerRoters2001], that
where is the number of true hypotheses and, thus, equals the cardinality of . The BH procedure was extended and modified several times in the literature [BenjaminiETAL2006, BenjaminiYekutieli2001, HeesenJanssen2016, HellerETAL2009, Sarkar2008, Storey2002, StoreyETAL2004, StoreyTibshirani2003]. Moreover, it was shown to be still FDR--controlling under positive regression dependence on each of a subset (PRDS) [BenjaminiYekutieli2001]
. However, test statistics for gene expression data may be negatively correlated like multinomial distributions when gene material is partitioned. As this example illustrates, specific dependence structure may be difficult to justify.GuoRao2008 stated: ”It is almost impossible to check from biological principles or real data sets whether the underlying test statistics satisfy the assumption of independence or positive dependence of some type”. This is in line with the statement of Efron2009: ”we expect the gene expressions to be correlated”, see also MoskvinaSchmidt2008. Consequently, the question arises:
How can the FDR be controlled under general dependent -values?
BenjaminiYekutieli2001 pointed out that for the BH procedure
always holds, even under arbitrary dependence. Consequently, the modified and corrected BH procedure, labeled by BY, with
always controls the FDR. However, this correction of logarithmic rate, , leads to a substantial loss of power under dependence compared to the classical BH procedure, in particular, when is large. In this context we want to quote Owen2005 who remarked that the BY correction ”could be very conservative”. For step-down tests with possibly less rejections GuoRao2008 provided an improvement of the correction factors by around 20-30. Unfortunately, the upper bound in (1.3) cannot be improved in general for SU tests [BenditkisETAL2018, GuoRao2008]. Although the examples attaining the upper bound have a constructive nature and are dealing with ”extremely unusal distributions” (as stated by BenjaminiHellerYekutieli2009), there is no hope to get universal FDR--controlling SU-tests. Throughout we will avoid special dependence assumptions. In contrast to that, we here focus on the case that we have mainly noise and just a few signals, a situation which is typically present in genome wide association studies.
1.1 Truncated BH procedures for rare signals
Let us consider an experiment, e.g., a genome wide association study, with rare signals, i.e., there are only a few false hypotheses and is expected to be close to . It seems plausible to anticipate that the fraction of rejections is small as well. Theoretically, this can be supported by an observation of DitzhausJanssen2019 under independence that implies for the BH procedure. When the fraction is small the majority of the large critical values does not have an impact. Having this in mind, we suggest to focus on a smaller pre-specified portion with of the critical values and to reduce the remaining ones, e.g., by truncation: for . Among other, we will show that for these truncated critical values the correction factor reduces to and, hence, the critical values
always lead to FDR control by . Depending on the choice of , this correction factor can be a significant improvement compared to the BY correction factor , see Table 1 for illustration. As for the classical Bonferroni test (), the SU test given by (1.5) may lead to rejections.
The present paper aims to compare the FDR under
the basic independence model as a benchmark and
the general dependence setting.
The paper is organized as follows. In Section 2, we first give an overview of existing step-up procedures leading to FDR--control under the basic independence assumption. Then we adapt the truncation idea from the previous section to these procedures and explain which correction factors are needed to preserve the FDR--control under dependence. By these correction factors we can quantify the price to pay for dependence. In this context, we also discuss adaptive procedures like the one of Storey2002. Moreover, we introduce an early stop procedure, for which the number of rejections is upper bounded by a pre-specified limit. Section 3 contains our main Theorems generalizing the results from Section 2. A general upper FDR bound for data dependent critical values is presented in Theorem 3.7. Special attention is devoted to new designed SU-tests for sparsity models. Their advantage is a reduction of the dependence correction factor. In this context, Example 3.8 explains how Storey’s adaptive test can be modified. In Section 4, we illustrate the new procedures’ benefits by discussing two real data examples. In particular, the role of the truncation parameter , see (1.5), is illustrated. The real data example of needleman:etal:1979 suggests to design SU-tests with enlarged critical values for very small . To overcome this difficulty, Theorem 5.12 offers another promising overall FDR--controlling modification of the Benjamini-Yekutieli procedure. All proofs are collected in Section 6.
2 First sparsity SU tests under dependence
2.1 Procedures under the basic independence assumption
We first present some known as well as a new SU procedure being (asymptotic) FDR--controlling. In the next section, we explain how to correct the corresponding critical values to preserve the FDR--control under any dependence structure.
The linear critical values introduced in Section 1 are replaced by general (ordered) critical values
with the restriction . Then the SU test procedure and the number of rejections is given by (1.1). The number of falsely rejected hypotheses is defined by the sum of all rejections of true nulls
A lot of critical values may be introduced by a non-decreasing, left continuous generating function
The function can be regarded as the inverse of so-called rejection curves [FinnerETAL2009]. The appertaining deterministic critical values are then
In the following we list some prominent examples of such critical values. Let be a fixed truncation or tuning parameter.
The BH procedure Benjamini_Hochberg_1995 is given by
The procedure of FinnerETAL2009, based on the asymptotic optimal rejection curve (AORC), corresponds to
Since we have here, the last coefficient needs to be modified such that , we refer to FinnerETAL2009 and Gontscharuk2010 for a detailed discussion.
The procedure of BlanchardRoquain2008, BlanchardRoquain2009 is defined by
It is well-known under the basic independence model that the procedures (W1) and (W3) are FDR--controlling, for the latter see Theorem 9 of BlanchardRoquain2009. The AORC procedure (W2) does not control the FDR under (BI) because the first coefficient is already too large. However, these values can be modified to achieve asymptotic FDR--control, see FinnerETAL2009 and see also HeesenJanssen2015 for some modifications. The new combination approach (W4) is FDR--controlling as well:
Let . Suppose that and . Then the procedure (W4) is FDR--controlling under the basic independence assumption.
Various FDR--controlling extensions have been established for ”positive dependent” -values, more precisely PRDS, reverse martingale models or extended BI-models with stochastically larger -values than the uniform ones for , i.e., .
Since the BH procedure does not exhaust the FDR level completely, compare to (1.2), the level in the linear BH critical values were replaced by a data depended, adjusted level , where
is an estimator for
. These so-called adaptive procedure are expected to exhaust the level better because, heuristically,. Various estimators are suggested in the literature [benjamini:hochberg:2000, BenjaminiETAL2006, BlanchardRoquain2008, BlanchardRoquain2009, HeesenJanssen2016, schweder:spjotvoll:1982, StoreyTibshirani2003, zeisel:ETAL:2011]. Exemplarily, we present the formula of the Storey estimator [Storey2002, StoreyETAL2004]:
where is a tuning parameter and denotes the empirical distribution function of the -values. Plugging-in this estimator into the BH procedure leads to FDR--control [StoreyETAL2004]. Conditions for FDR--control of linear adaptive tests were reviewed and established by HeesenJanssen2015, HeesenJanssen2016. To include these and more general adaptive procedures, we consider, additionally to the deterministic critical values, the data driven critical values
where is any estimator of . Some of our results are even valid for arbitrary, data-driven critical values
2.2 Sparsity modifications
We extend here the truncation idea introduced in Section 1.1 from the classical BH procedure to all the other procedures mentioned in the previous section. Thus, let be pre-chosen and let be one of the generating functions corresponding to the procedures (W1)–(W4). The main focus lies on the sparse signal case, where is chosen, but the results hold for any . At the end of the previous section, we introduced adaptive procedures using an estimation step. In the sparse case, we expect with close to . Throughout the remaining paper, we consider just estimators for such that
We want to point out that the Storey estimator (2.4) may become larger than . That is why we consider also the case , whereas the choice may be more plausible in applications. By
For and we obtain the Bonferroni as well as the BH tests by setting and , respectively. As in (1.5), the critical values need to be corrected under dependence. As an immediate consequence of our main Theorem 3.5, we get:
The correction in (2.9) can be separated into two parts: (1) a procedure correction listed in Table 2 and (2) a dependence structure correction , which is also affected by the choice between deterministic and adaptive critical values, see Table 3.
Let for large and . Let us just discuss the BH procedure under dependence. Then
Consequently, by truncation we reduce the dependence correction by a factor close to compared to the classical BY correction.
2.3 Early stopped multiple procedures
In many studies, e.g., dealing with genes, a screening step is necessary in a first experiment. But what shall be done when in the first experiment too many hypotheses are rejected and resources for a detailed examination are limited? Say, exceeds , whereas the close investigation is limited to genes, for example. Ad hoc, it seems plausible to select just the hypotheses corresponding the smallest -values. But, already under BI, well-known examples show that this new reduced procedure is not FDR--controlling. Our truncation idea cannot solve this limitation problem since is possible. However, our early stopped procedure ensures for any pre-specified , at least whenever the very conservative Bonferroni test with rejects not more than hypotheses. The data dependent critical values of the new procedure are given by
Consider the critical values (2.10).
Under BI we have .
Under arbitrary dependence models the corrected critical values
lead to an FDR--controlling procedure.
3 Main results
3.1 Procedures based on general generating functions
Here, we extend the results of Section 2.3 to critical values of the shape (2.8) for general generating functions (2.2) and estimators fulfilling (2.7). The tuning parameter can be freely chosen, it is also possible to set whenever . As an extension of the procedure correction , let us introduce via the right continuous inverse of :
where and is the integer part of . In comparison to Section 2.2, the procedure correction also depends on the choice between deterministic and adaptive critical values. We want to point out that most of all generating functions are convex. For these, both suprema are attained at the end point of the considered interval.
Consider the SU-tests with critical values (2.8).
(BI bound) Suppose that only depends on the -values . Then
In the deterministic case , the inequality (3.2) is still valid when is replaced by and is set equal to .
(Arbitrary dependence, deterministic case) Suppose that and, in particular, . Then
provided that .
(Arbitrary dependence and adaptivity) Suppose additionally that is absolutely continuous, i.e., there exists a Lebesgue almost sure derivative such that . In contrast to (i), we allow that the estimator may use the information of all -values. Then
The preceding Theorem provides a construction principle for SU-tests controlling the FDR at level . For this purpose, let us return to the dependence structure correction factor , see Table 3. This factor depends, as the procedure correction , on the choice between deterministic and adaptive critical values. Consequently, we get
The corrected critical values
lead to FDR--controlling procedures.
3.2 General FDR upper bound under dependence
Doehler2018 derived general upper FDR bounds for deterministic . For our purposes, we need to, among others, extend these bounds to data dependent critical values . Suppose
For each let be constant on the set , i.e., the set on which is rejected.
Under arbitrary dependence we have for the SU-procedure based on the truncated critical values
The upper bound can be rewritten as
Clearly, Theorem 3.7 includes procedures based on the original critical values , i.e., without truncation, by setting . Without the truncation assumption (2.7) Storey’s test based on (2.4) can be extremely liberal under strong dependence, see RomanoETAL2008, but a sparsity modification may be helpful. Better performances can be obtained for the choice of a small tuning parameter for Storey’s procedure, see SarkarHeller2008. Part (a) of the following Example is a modification of Proposition 17 from BlanchardRoquain2009.
Example 3.8 (Extreme dependence)
Introduce for a uniformly distributed random variabel on . We set for the -values of false hypotheses and define .
Consider the Storey estimator (2.4). The critical values are then given by
Note that we have for . Thus,
which equals when is sufficiently large. Only for we have FDR--control for all , whereas is often proposed to be close to .
(Sparsity modification) Let for some with . Then
with “ “ in the case of . Then and
A sparsity assumption expressed by the choice of close to allows the slightly increased universal FDR bound for each .
Remark 3.9 (SD tests)
A popular alternative to SU tests are step-down (SD) procedures. In case of the latter, the number of rejections is given by
Regarding (11)–(13) in BenditkisETAL2018, we can adjust our proofs for certain SD tests. To be more specific, the upper bound of Theorem 3.7 is valid for any SD test based on (data driven) critical values such that Assumption (A) is fulfilled. In particular, all other FDR upper bounds from the previous sections hold. We want to point out that Assumption (A) depends on the choice of and, thus, it may differ for the SU and the SD test, even when the same critical values are used.
4 Real data example
To illustrate the benefits of our new procedures, we apply them to two different real data examples. We focus on the truncated version (1.5) of the BH procedure, in short denoted by BH, and the early stop procedure, in short ES, from Section 2.3. These procedures are compared with the dependence corrected BH procedure (BY) of BenjaminiYekutieli2001 and the Bonferroni (Bonf) procedure, where we set for all of them.
Example 4.10 (Colon tissue)
The colon adenocarcinomas data of NottermanETAL2001 consists of gene measurements for patients on adenocarcinomas tumor and normal tissues, respectively. The data set is available, e.g., in the R package mutoss
. For each gene measurement we applied Welch’s paired t-test resulting in-values.
Example 4.11 (Dentine lead)
needleman:etal:1979 examined the neuropsychologic effects of unidentified childhood exposure to lead. Therefore, they compared the Wechsler Intelligence Scale for Children (Revised), certain verbal processing scores, the reaction times and the classroom performances of children with high dentine lead levels and children with low dentine levels. At all, there are different measurements and corresponding -values, see Tables 3, 7 and 8 in [needleman:etal:1979].
Since the BH procedure is only FDR--controlling under specific dependence structures we exclude it in our comparison. For the sake of completeness, we nevertheless want to state that and are the number of rejections in the situation of Example 4.10 and Example 4.11, respectively. Our truncated BH procedure is valid under arbitrary dependence and leads, for a variety of different , to significantly more rejections than the BY and Bonferroni procedure in both examples, see Figure 1. The plot corresponding to Example 4.11 illustrates two general phenomenons, which should be kept in mind when choosing . First, for large the improvement in comparison to the BY procedure becomes more and more negligible, which is not surprising due to the logarithmic rate of the correction factor . Second, a too small may lead to fewer rejections than the BY procedure. Choosing as the truncation parameter implies that we give a special attention to the -values below , see (1.5), ignoring the remaining ones. Hence, it is not surprising that a choice smaller than , the number of rejections for the BY procedure, may lead to fewer rejections than . In addition to expected sparse signals, the choice of can also be influenced by limited resources for further data processing steps. Under these circumstances Example 4.10 advocates to use the ES()-procedure under restriction priorities for . After some small effects the number of rejections (dotted line) is approximately linear. When the dotted line reaches the BH() curve, early stopping is obsolete, see Figures 1 and 2.
To judge the restriction appropriately, we want to point out that the Bonferroni test does not only control the FDR level but also the family-wise error rate
, the probability that at least one rejection is false. Hence, it can be seen as a minimal requirement to beat the Bonferroni method. In this context, ”to beat” means that the number of rejections should be at least. For small , as in Example 4.11, the Bonferroni method is quite powerful, e.g., it beats the BY procedure, we have whereas . Although our two procedures reject up to hypotheses for the right choice of in Example 4.11, they reject no hypotheses when and, in particular, they are worse than the Bonferroni method.
Under sparsity the smallest critical values are most important. For such designs we propose another promising SU-tests with a first coefficient closer to the Bonferroni coefficient . Introduce
This new SU-procedure, labeled as our new sparsity test of size for short SP(), is motivated by Example 4.11. Here, Bonferroni is superior to BY, which has too small coefficients at the beginning. Table 4 displays a rough approximation of the first critical values and , respectively, for different procedures when and are large.
(cf. [HeesenJanssen2015], Prop. 4.1) Under BI we have
Under arbitrary dependence .
For the normalizing coefficients obey the inequality
Since the degree of sparsity is typically unknown the truncation parameter of BH() should be not too small. But, for larger the BH() coefficients are smaller, which should be judged within a trade-off. Here, the new SP() procedure helps with increased lower coefficients. The substitution of BH() by SP() then gives more security for very sparse signals. Note that for each there exists a largest value with
For intermediate the crucial index can be obtained directly. Using the approximations and we get approximately . In contrast to BH(), the SP() procedure is at least as good as the Bonferroni test for every in the context of Example 4.11.
We first present the proofs corresponding to our main results from Section 3. After that we prove the remaining statements in order of their appearance in the paper.
6.1 Proof of Theorem 3.5
(i): First, observe that HeesenJanssen2015 already proved, see their Lemma 7.1(a), that
By our assumptions we have and
For we can conclude
Finally, we can deduce
which completes the proof of (3.2). Analogously, the statement for the deterministic case can be obtained.
for -almost all and a finite measure specified later, where may depend on . It should be mentioned that BlanchardRoquain2009 already used this technique.
(ii): For the generating function the measure needs to be specified only on if . We denote by the Dirac measure centred at , i.e., . Let
Then (6.1) holds, which implies for all that
for the probability measure . Lemma 7.1(b) of HeesenJanssen2015 with and yields . Finally, note
(iii): Since we just need to specify on . Here, we define as a mixture of a Dirac measure and a Lebesgue measure with density :
Setting again we obtain
Applying Lemma 7.1(b) of HeesenJanssen2015, now with , yields
Finally, using integration by parts we can deduce
6.2 Proof of Theorem 3.7
To simplify the notation, assume . Let and set , where only the -th coordinate is replaced by for . For the proof, we prefer to explicitly state the dependence from and write . Assumption (A) implies for all
Introduce for fixed the sets and
Below, we use the well-known inclusion for step-up tests:
For we have
The -th summand can be transformed by (6.3):
Interchanging the summation yields
6.3 Proof of Theorem 2.1
The proof is related to [HeesenJanssen2016] and relies on conditioning with respect to the -algebra
Adopting the notation of the proof for Theorem 3.7 we obtain from Fubini’s Theorem that for