An FDR upper bound for an adaptive one-way GBH procedure under exchangeability

10/24/2019 ∙ by Yueqiao Faith Zhang, et al. ∙ 0

There has been some numerical evidence on the conservativeness of an adaptive one-way GBH procedure for multiple testing the means of equally correlated normal random variables. However, a theoretical investigation into this seems to be lacking. We provide an analytic, non-asymptotic FDR upper bound for such a procedure under the aforementioned multiple testing scenario. The bound is not tight but reasonably quantifies how bad the FDR of the procedure can be. As by-products, we extend two relevant existing results to the setting of p-values that are not necessarily super-uniform.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Controlling the false discovery rate (FDR, [2]) has become a routine practice in multiple hypothesis testing. Recently, weighted FDR procedures such as those of [8, 9] have exemplified excellent performances due to their abilities to better adapt to the proportion of signals or incorporate potential structures among the hypotheses. The “adaptive one-way GBH (GBH)” procedure of [8] likely represents the latest advance on designing data-adaptive weights that ensure the non-asymptotic conservativeness of the resultant testing procedure for grouped, weighted hypothesis testing, and reduces to Storey’s procedure of [9] when there is only one group. Even though these procedures have been shown to be conservative under independence, non-asymptotically gauging their FDRs under dependence is quite challenging. There has been some numerical evidence on the non-asymptotic conservativeness of Storey’s procedure and the GBH when they are applied to multiple testing the means of equally correlated normal random variables; see, e.g., [5, 6, 8]. However, a theoretical investigation into this does not seem to exist in the literature. In this note, we provide an analytic, non-asymptotic FDR upper bound for the GBH in the aforementioned multiple testing scenario. The bound is not tight but quantifies the maximal FDR of the GBH correspondingly. As by-products, Lemma 3 extends Lemma 3.2 of [5], and Lemma 4 extends Lemma 1 of [8], both to the setting where p-values are not necessarily super-uniform.

We begin with the testing problem. Let be i.i.d. standard normal, where is defined to be the set for each natural number . For a constant , let for . Then ’s are exchangeable and equally correlated with correlation . We simultaneously test hypotheses versus for . This scenario has been commonly used as a “standard model” to assess the conservativeness of an FDR procedure under dependence by, e.g., [3, 6, 7, 8]. For each , consider its associated p-value , where

is the CDF of the standard normal distribution. The GBH

, to be applied to , is stated as follows:

  • Group hypotheses: let the non-empty sets be a partition of , and accordingly let be partitioned into for .

  • Construct data-adaptive weights: fix a , the tuning parameter, and for each and , set

    where and with being the indicator function of a set , and is the cardinality of .

  • Weight p-values and reject hypotheses: weight the p-values , into , and apply the BH procedure to at nominal FDR level .

Here is our main result:

Theorem 1.

When and , the FDR of GBH is upper bounded by

In the theorem we restrict mainly because researchers often choose or in practice (see [6] and [8]). Also, the requirement for is to ensure some integrals to be finite in the proof of Theorem 1, and the interval is obtained by solving and resulting from the calculations for the integrals in (6) when . The ratio for is partially visualized by Figure 1.

Figure 1: Ratio of the FDR upper bound to the nominal FDR level when , and . The curves from top to bottom are respectively associated with from 0.05 to 1/2 with increment 0.05

From Figure 1, we see that is increasing in but decreasing in . Further, the ratio is always less than when , and is less than when , making the upper bound useful for a good range of when . On the other hand, is achieved when and . However, the case of corresponds to independence among the normal random variables and hence among the p-values, for which the FDR of GBH is upper bounded by . So, is not tight. This is mainly because we used the suprema of several quantities related to ; see Lemma 2 and Lemma 3.

2 Proof of Theorem 1

We provide a streamlined proof of Theorem 1 in Section 2.1 and relegate auxiliary results in Section 2.2 and Section 2.3.

2.1 A streamlined proof

Let and be respectively numbers of false rejections and total rejections of GBH, we first consider the conditional expectation . Since is set when , we can assume throughout the article. Let be the index set of true null hypotheses among the hypotheses, the index set of true null hypotheses for group , and the cardinality of . Further let

be the vector of the

p-values, and the vector obtained by excluding from . Then

(1)

where for each and

with and , and the inequality is due to the fact that is non-decreasing in for all , and .

Define , , , and for each and . Set . Then the inequality (1) implies

(2)
(3)

where is defined by Lemma 2 and the inequality (2) holds by Lemma 3. Set

Applying Lemma 4 with in place of to the expectation in (3) gives

(4)

where the last equality follows from . Let be the FDR of the GBH procedure. Then with (4) we obtain

(6)

where denotes the PDF of the standard normal distribution, with , and . Specifically, (2.1) is due to Lemma 2, (11) and (12), and in (6) we have:

and

Therefore,

where is given in the statement of Theorem 1.

2.2 An upper bound related to the probability of a conditional false rejection

For and

, we have the “probability of a conditional false rejection” as

which induces the ratio

Note that is set since holds, and that . The key result in this subsection is an upper bound on (or introduced later), given by Lemma 2.

First, let us verify that is upper bounded on . Setting and gives an equivalent representation of as . Clearly, when , and when . However, is continuous for . So, attains its maximum at some and is thus bounded on .

Secondly, let us find an upper bound for . Setting with and gives another equivalent representation of as . So, it suffices to upper bound on . Clearly, when , when , , and . So , and it suffices to upper bound on . To this end, we need the following:

Lemma 1.

For ,

(7)
Proof.

By the mean value theorem,

(8)

where for some . On the other hand,

However, the identity

holds for all and . So, for all ,

where and are nonzero constants. Solving for from the above equation yields , and substituting it back into (8) gives (7). ∎

With Lemma 1, we can obtain an upper bound for (or ) as follows:

Lemma 2.

For we have , where

Proof.

We will divide the arguments for two cases. Case (1): (i.e., . Regardless of the values of and , we have for . On the other hand,

for each , and . So, , and when .

Case (2): (i.e., ). We have 2 subcases. If , then by the same argument as above. If , then

(9)

for by [4], and

where the inequality follows from Lemma 1. Now we can easily verify that on ,

and

So,

and