Dependence correction of multiple tests with applications to sparsity

09/30/2019 ∙ by Marc Ditzhaus, et al. ∙ TU Dortmund 0

The present paper establishes new multiple procedures for simultaneous testing of a large number of hypotheses under dependence. Special attention is devoted to experiments with rare false hypotheses. This sparsity assumption is typically for various genome studies when a portion of remarkable genes should be detected. The aim is to derive tests which control the false discovery rate (FDR) always at finite sample size. The procedures are compared for the set up of dependent and independent p-values. It turns out that the FDR bounds differ by a dependency factor which can be used as a correction quantity. We offer sparsity modifications and improved dependence tests which generalize the Benjamini-Yekutieli test and adaptive tests in the sense of Storey. As a byproduct, an early stopped test is presented in order to bound the number of rejections. The new procedures perform well for real genome data examples.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and motivation

In genomics, but also in various other fields, several tests are applied simultaneously. Throughout, let denote a family of hypotheses for big data experiments with associated -values

and vector

. If nothing else is said, let

be uniformly distributed on

for each . Here, is the set of indices belonging to true hypotheses, i.e., iff is true. Analogously, introduce the index set of false hypotheses. Since the type error increases naturally when grows, corrections for multiple testing are needed. The classical Bonferroni correction, where the test size is divided by the number of tests, is too conservative, especially for large . Benjamini_Hochberg_1995 promoted to use the false discovery rate (FDR) as decision criteria for multiple testing procedures. The FDR is defined by

where is the number of false rejections and equals the number of all rejections. The step-up (SU) multiple testing procedure of Benjamini_Hochberg_1995, in short denoted by BH procedure, is based on the linear critical values and rejects all hypotheses with


and denote the order statistics. The FDR was first established under the following assumption:

Basic independence assumption (BI): The uniformly distributed -values corresponding to true hypotheses are mutually independent as well as independent of the -values belonging to false hypotheses.

At least under BI, the BH test controls the FDR for a pre-specified level . In fact, we have, see [Benjamini_Hochberg_1995, FinnerRoters2001], that


where is the number of true hypotheses and, thus, equals the cardinality of . The BH procedure was extended and modified several times in the literature [BenjaminiETAL2006, BenjaminiYekutieli2001, HeesenJanssen2016, HellerETAL2009, Sarkar2008, Storey2002, StoreyETAL2004, StoreyTibshirani2003]. Moreover, it was shown to be still FDR--controlling under positive regression dependence on each of a subset (PRDS) [BenjaminiYekutieli2001]

. However, test statistics for gene expression data may be negatively correlated like multinomial distributions when gene material is partitioned. As this example illustrates, specific dependence structure may be difficult to justify.

GuoRao2008 stated: ”It is almost impossible to check from biological principles or real data sets whether the underlying test statistics satisfy the assumption of independence or positive dependence of some type”. This is in line with the statement of Efron2009: ”we expect the gene expressions to be correlated”, see also MoskvinaSchmidt2008. Consequently, the question arises:

  • How can the FDR be controlled under general dependent -values?

BenjaminiYekutieli2001 pointed out that for the BH procedure


always holds, even under arbitrary dependence. Consequently, the modified and corrected BH procedure, labeled by BY, with


always controls the FDR. However, this correction of logarithmic rate, , leads to a substantial loss of power under dependence compared to the classical BH procedure, in particular, when is large. In this context we want to quote Owen2005 who remarked that the BY correction ”could be very conservative”. For step-down tests with possibly less rejections GuoRao2008 provided an improvement of the correction factors by around 20-30. Unfortunately, the upper bound in (1.3) cannot be improved in general for SU tests [BenditkisETAL2018, GuoRao2008]. Although the examples attaining the upper bound have a constructive nature and are dealing with ”extremely unusal distributions” (as stated by BenjaminiHellerYekutieli2009), there is no hope to get universal FDR--controlling SU-tests. Throughout we will avoid special dependence assumptions. In contrast to that, we here focus on the case that we have mainly noise and just a few signals, a situation which is typically present in genome wide association studies.

1.1 Truncated BH procedures for rare signals

Let us consider an experiment, e.g., a genome wide association study, with rare signals, i.e., there are only a few false hypotheses and is expected to be close to . It seems plausible to anticipate that the fraction of rejections is small as well. Theoretically, this can be supported by an observation of DitzhausJanssen2019 under independence that implies for the BH procedure. When the fraction is small the majority of the large critical values does not have an impact. Having this in mind, we suggest to focus on a smaller pre-specified portion with of the critical values and to reduce the remaining ones, e.g., by truncation: for . Among other, we will show that for these truncated critical values the correction factor reduces to and, hence, the critical values


always lead to FDR control by . Depending on the choice of , this correction factor can be a significant improvement compared to the BY correction factor , see Table 1 for illustration. As for the classical Bonferroni test (), the SU test given by (1.5) may lead to rejections.

Table 1: The correction factors under sparsity for various choices of

1.2 Overview

The present paper aims to compare the FDR under

  • the basic independence model as a benchmark and

  • the general dependence setting.

Our FDR bounds differ by a factor which can be used as a correction factor for overall valid dependence procedures, similarly to (1.3) and (1.4).

The paper is organized as follows. In Section 2, we first give an overview of existing step-up procedures leading to FDR--control under the basic independence assumption. Then we adapt the truncation idea from the previous section to these procedures and explain which correction factors are needed to preserve the FDR--control under dependence. By these correction factors we can quantify the price to pay for dependence. In this context, we also discuss adaptive procedures like the one of Storey2002. Moreover, we introduce an early stop procedure, for which the number of rejections is upper bounded by a pre-specified limit. Section 3 contains our main Theorems generalizing the results from Section 2. A general upper FDR bound for data dependent critical values is presented in Theorem 3.7. Special attention is devoted to new designed SU-tests for sparsity models. Their advantage is a reduction of the dependence correction factor. In this context, Example 3.8 explains how Storey’s adaptive test can be modified. In Section 4, we illustrate the new procedures’ benefits by discussing two real data examples. In particular, the role of the truncation parameter , see (1.5), is illustrated. The real data example of needleman:etal:1979 suggests to design SU-tests with enlarged critical values for very small . To overcome this difficulty, Theorem 5.12 offers another promising overall FDR--controlling modification of the Benjamini-Yekutieli procedure. All proofs are collected in Section 6.

2 First sparsity SU tests under dependence

2.1 Procedures under the basic independence assumption

We first present some known as well as a new SU procedure being (asymptotic) FDR--controlling. In the next section, we explain how to correct the corresponding critical values to preserve the FDR--control under any dependence structure.
The linear critical values introduced in Section 1 are replaced by general (ordered) critical values


with the restriction . Then the SU test procedure and the number of rejections is given by (1.1). The number of falsely rejected hypotheses is defined by the sum of all rejections of true nulls

A lot of critical values may be introduced by a non-decreasing, left continuous generating function


The function can be regarded as the inverse of so-called rejection curves [FinnerETAL2009]. The appertaining deterministic critical values are then


In the following we list some prominent examples of such critical values. Let be a fixed truncation or tuning parameter.

  1. The BH procedure Benjamini_Hochberg_1995 is given by

  2. The procedure of FinnerETAL2009, based on the asymptotic optimal rejection curve (AORC), corresponds to

    Since we have here, the last coefficient needs to be modified such that , we refer to FinnerETAL2009 and Gontscharuk2010 for a detailed discussion.

  3. The procedure of BlanchardRoquain2008, BlanchardRoquain2009 is defined by

  4. A new procedure is given by a combination of (W2) and (W3):

It is well-known under the basic independence model that the procedures (W1) and (W3) are FDR--controlling, for the latter see Theorem 9 of BlanchardRoquain2009. The AORC procedure (W2) does not control the FDR under (BI) because the first coefficient is already too large. However, these values can be modified to achieve asymptotic FDR--control, see FinnerETAL2009 and see also HeesenJanssen2015 for some modifications. The new combination approach (W4) is FDR--controlling as well:

Theorem 2.1

Let . Suppose that and . Then the procedure (W4) is FDR--controlling under the basic independence assumption.

Various FDR--controlling extensions have been established for ”positive dependent” -values, more precisely PRDS, reverse martingale models or extended BI-models with stochastically larger -values than the uniform ones for , i.e., .
Since the BH procedure does not exhaust the FDR level completely, compare to (1.2), the level in the linear BH critical values were replaced by a data depended, adjusted level , where

is an estimator for

. These so-called adaptive procedure are expected to exhaust the level better because, heuristically,

. Various estimators are suggested in the literature [benjamini:hochberg:2000, BenjaminiETAL2006, BlanchardRoquain2008, BlanchardRoquain2009, HeesenJanssen2016, schweder:spjotvoll:1982, StoreyTibshirani2003, zeisel:ETAL:2011]. Exemplarily, we present the formula of the Storey estimator [Storey2002, StoreyETAL2004]:


where is a tuning parameter and denotes the empirical distribution function of the -values. Plugging-in this estimator into the BH procedure leads to FDR--control [StoreyETAL2004]. Conditions for FDR--control of linear adaptive tests were reviewed and established by HeesenJanssen2015, HeesenJanssen2016. To include these and more general adaptive procedures, we consider, additionally to the deterministic critical values, the data driven critical values


where is any estimator of . Some of our results are even valid for arbitrary, data-driven critical values


2.2 Sparsity modifications

We extend here the truncation idea introduced in Section 1.1 from the classical BH procedure to all the other procedures mentioned in the previous section. Thus, let be pre-chosen and let be one of the generating functions corresponding to the procedures (W1)–(W4). The main focus lies on the sparse signal case, where is chosen, but the results hold for any . At the end of the previous section, we introduced adaptive procedures using an estimation step. In the sparse case, we expect with close to . Throughout the remaining paper, we consider just estimators for such that


We want to point out that the Storey estimator (2.4) may become larger than . That is why we consider also the case , whereas the choice may be more plausible in applications. By

we can transfer any estimator into an estimator fulfilling (2.7) for any pre-chosen . In the deterministic case , we set . In the spirit of Section 1.1, we truncate the critical values (2.5):


For and we obtain the Bonferroni as well as the BH tests by setting and , respectively. As in (1.5), the critical values need to be corrected under dependence. As an immediate consequence of our main Theorem 3.5, we get:

Corollary 2.2

The corrected critical values


specified by the values in Tables 2 and 3 always lead to FDR--control.

The correction in (2.9) can be separated into two parts: (1) a procedure correction listed in Table 2 and (2) a dependence structure correction , which is also affected by the choice between deterministic and adaptive critical values, see Table 3.

Example 2.3

Let for large and . Let us just discuss the BH procedure under dependence. Then

Consequently, by truncation we reduce the dependence correction by a factor close to compared to the classical BY correction.

(W1) (W2) (W3) (W4)
Table 2: The procedure correction factor for (W1), (W2), (W3), (W4) with
BI Dependence
deterministic 1
Table 3: Comparison of correction factor for the different dependence models

2.3 Early stopped multiple procedures

In many studies, e.g., dealing with genes, a screening step is necessary in a first experiment. But what shall be done when in the first experiment too many hypotheses are rejected and resources for a detailed examination are limited? Say, exceeds , whereas the close investigation is limited to genes, for example. Ad hoc, it seems plausible to select just the hypotheses corresponding the smallest -values. But, already under BI, well-known examples show that this new reduced procedure is not FDR--controlling. Our truncation idea cannot solve this limitation problem since is possible. However, our early stopped procedure ensures for any pre-specified , at least whenever the very conservative Bonferroni test with rejects not more than hypotheses. The data dependent critical values of the new procedure are given by

Theorem 2.4

Consider the critical values (2.10).

  1. Under BI we have .

  2. Under arbitrary dependence models the corrected critical values

    lead to an FDR--controlling procedure.

3 Main results

3.1 Procedures based on general generating functions

Here, we extend the results of Section 2.3 to critical values of the shape (2.8) for general generating functions (2.2) and estimators fulfilling (2.7). The tuning parameter can be freely chosen, it is also possible to set whenever . As an extension of the procedure correction , let us introduce via the right continuous inverse of :


where and is the integer part of . In comparison to Section 2.2, the procedure correction also depends on the choice between deterministic and adaptive critical values. We want to point out that most of all generating functions are convex. For these, both suprema are attained at the end point of the considered interval.

Theorem 3.5

Consider the SU-tests with critical values (2.8).

  1. (BI bound) Suppose that only depends on the -values . Then


    In the deterministic case , the inequality (3.2) is still valid when is replaced by and is set equal to .

  2. (Arbitrary dependence, deterministic case) Suppose that and, in particular, . Then

    provided that .

  3. (Arbitrary dependence and adaptivity) Suppose additionally that is absolutely continuous, i.e., there exists a Lebesgue almost sure derivative such that . In contrast to (i), we allow that the estimator may use the information of all -values. Then


The preceding Theorem provides a construction principle for SU-tests controlling the FDR at level . For this purpose, let us return to the dependence structure correction factor , see Table 3. This factor depends, as the procedure correction , on the choice between deterministic and adaptive critical values. Consequently, we get

Corollary 3.6

The corrected critical values


lead to FDR--controlling procedures.

Note that (3.4) is scale invariant regarding and Theorem 3.5 always applies.

3.2 General FDR upper bound under dependence

Doehler2018 derived general upper FDR bounds for deterministic . For our purposes, we need to, among others, extend these bounds to data dependent critical values . Suppose

Assumption (A)

For each let be constant on the set , i.e., the set on which is rejected.

Theorem 3.7

Under arbitrary dependence we have for the SU-procedure based on the truncated critical values


The upper bound can be rewritten as


Clearly, Theorem 3.7 includes procedures based on the original critical values , i.e., without truncation, by setting . Without the truncation assumption (2.7) Storey’s test based on (2.4) can be extremely liberal under strong dependence, see RomanoETAL2008, but a sparsity modification may be helpful. Better performances can be obtained for the choice of a small tuning parameter for Storey’s procedure, see SarkarHeller2008. Part (a) of the following Example is a modification of Proposition 17 from BlanchardRoquain2009.

Example 3.8 (Extreme dependence)

Introduce for a uniformly distributed random variabel on . We set for the -values of false hypotheses and define .

  1. Consider the Storey estimator (2.4). The critical values are then given by

    Note that we have for . Thus,

    which equals when is sufficiently large. Only for we have FDR--control for all , whereas is often proposed to be close to .

  2. (Sparsity modification) Let for some with . Then

    with “ “ in the case of . Then and

    A sparsity assumption expressed by the choice of close to allows the slightly increased universal FDR bound for each .

Remark 3.9 (SD tests)

A popular alternative to SU tests are step-down (SD) procedures. In case of the latter, the number of rejections is given by

Regarding (11)–(13) in BenditkisETAL2018, we can adjust our proofs for certain SD tests. To be more specific, the upper bound of Theorem 3.7 is valid for any SD test based on (data driven) critical values such that Assumption (A) is fulfilled. In particular, all other FDR upper bounds from the previous sections hold. We want to point out that Assumption (A) depends on the choice of and, thus, it may differ for the SU and the SD test, even when the same critical values are used.

4 Real data example

To illustrate the benefits of our new procedures, we apply them to two different real data examples. We focus on the truncated version (1.5) of the BH procedure, in short denoted by BH, and the early stop procedure, in short ES, from Section 2.3. These procedures are compared with the dependence corrected BH procedure (BY) of BenjaminiYekutieli2001 and the Bonferroni (Bonf) procedure, where we set for all of them.

Example 4.10 (Colon tissue)

The colon adenocarcinomas data of NottermanETAL2001 consists of gene measurements for patients on adenocarcinomas tumor and normal tissues, respectively. The data set is available, e.g., in the R package mutoss

. For each gene measurement we applied Welch’s paired t-test resulting in


Example 4.11 (Dentine lead)

needleman:etal:1979 examined the neuropsychologic effects of unidentified childhood exposure to lead. Therefore, they compared the Wechsler Intelligence Scale for Children (Revised), certain verbal processing scores, the reaction times and the classroom performances of children with high dentine lead levels and children with low dentine levels. At all, there are different measurements and corresponding -values, see Tables 3, 7 and 8 in [needleman:etal:1979].

Since the BH procedure is only FDR--controlling under specific dependence structures we exclude it in our comparison. For the sake of completeness, we nevertheless want to state that and are the number of rejections in the situation of Example 4.10 and Example 4.11, respectively. Our truncated BH procedure is valid under arbitrary dependence and leads, for a variety of different , to significantly more rejections than the BY and Bonferroni procedure in both examples, see Figure 1. The plot corresponding to Example 4.11 illustrates two general phenomenons, which should be kept in mind when choosing . First, for large the improvement in comparison to the BY procedure becomes more and more negligible, which is not surprising due to the logarithmic rate of the correction factor . Second, a too small may lead to fewer rejections than the BY procedure. Choosing as the truncation parameter implies that we give a special attention to the -values below , see (1.5), ignoring the remaining ones. Hence, it is not surprising that a choice smaller than , the number of rejections for the BY procedure, may lead to fewer rejections than . In addition to expected sparse signals, the choice of can also be influenced by limited resources for further data processing steps. Under these circumstances Example 4.10 advocates to use the ES()-procedure under restriction priorities for . After some small effects the number of rejections (dotted line) is approximately linear. When the dotted line reaches the BH() curve, early stopping is obsolete, see Figures 1 and 2.

To judge the restriction appropriately, we want to point out that the Bonferroni test does not only control the FDR level but also the family-wise error rate

, the probability that at least one rejection is false. Hence, it can be seen as a minimal requirement to beat the Bonferroni method. In this context, ”to beat” means that the number of rejections should be at least

. For small , as in Example 4.11, the Bonferroni method is quite powerful, e.g., it beats the BY procedure, we have whereas . Although our two procedures reject up to hypotheses for the right choice of in Example 4.11, they reject no hypotheses when and, in particular, they are worse than the Bonferroni method.

Figure 1: The number of rejections corresponding to the BH (solid line) and the ES (dotted line) procedures applying to Example 4.10 (left) and 4.11 (right) are plotted for various . The height of the horizontal dashed and dot-dashed lines equal for the BY and the Bonferroni procedure, respectively.
Figure 2: The number of rejections corresponding to the BH (solid line) and the ES (dotted line) procedures applying to Example 4.10 (left) and 4.11 (right) are plotted for various . For comparison, the map is displayed (dashed line).

5 Miscellaneous

Under sparsity the smallest critical values are most important. For such designs we propose another promising SU-tests with a first coefficient closer to the Bonferroni coefficient . Introduce

This new SU-procedure, labeled as our new sparsity test of size for short SP(), is motivated by Example 4.11. Here, Bonferroni is superior to BY, which has too small coefficients at the beginning. Table 4 displays a rough approximation of the first critical values and , respectively, for different procedures when and are large.

Bon BY BH() SP()
Table 4: The first critical values of the new Benjamini-Yekutieli procedure as well as the ones discussed in Section 4
Theorem 5.12
  1. (cf. [HeesenJanssen2015], Prop. 4.1) Under BI we have

  2. Under arbitrary dependence .

  3. For the normalizing coefficients obey the inequality

Since the degree of sparsity is typically unknown the truncation parameter of BH() should be not too small. But, for larger the BH() coefficients are smaller, which should be judged within a trade-off. Here, the new SP() procedure helps with increased lower coefficients. The substitution of BH() by SP() then gives more security for very sparse signals. Note that for each there exists a largest value with


For intermediate the crucial index can be obtained directly. Using the approximations and we get approximately . In contrast to BH(), the SP() procedure is at least as good as the Bonferroni test for every in the context of Example 4.11.

6 Proofs

We first present the proofs corresponding to our main results from Section 3. After that we prove the remaining statements in order of their appearance in the paper.

6.1 Proof of Theorem 3.5

(i): First, observe that HeesenJanssen2015 already proved, see their Lemma 7.1(a), that

By our assumptions we have and

For we can conclude

Finally, we can deduce

which completes the proof of (3.2). Analogously, the statement for the deterministic case can be obtained.

The basic idea of the proof of (ii) and (iii) is to rewrite the generating function as


for -almost all and a finite measure specified later, where may depend on . It should be mentioned that BlanchardRoquain2009 already used this technique.

(ii): For the generating function the measure needs to be specified only on if . We denote by the Dirac measure centred at , i.e., . Let

Then (6.1) holds, which implies for all that

for the probability measure . Lemma 7.1(b) of HeesenJanssen2015 with and yields . Finally, note

(iii): Since we just need to specify on . Here, we define as a mixture of a Dirac measure and a Lebesgue measure with density :

Setting again we obtain

Applying Lemma 7.1(b) of HeesenJanssen2015, now with , yields

Finally, using integration by parts we can deduce

6.2 Proof of Theorem 3.7

To simplify the notation, assume . Let and set , where only the -th coordinate is replaced by for . For the proof, we prefer to explicitly state the dependence from and write . Assumption (A) implies for all


Introduce for fixed the sets and

Below, we use the well-known inclusion for step-up tests:


For we have

The -th summand can be transformed by (6.3):

Interchanging the summation yields

The equality proves (3.6) because for . The other representation of the upper bound, see (3.5), follows from a rearrangement and the trivial observation

6.3 Proof of Theorem 2.1

The proof is related to [HeesenJanssen2016] and relies on conditioning with respect to the -algebra

Adopting the notation of the proof for Theorem 3.7 we obtain from Fubini’s Theorem that for