A Scan Procedure for Multiple Testing

08/02/2018 ∙ by Shiyun Chen, et al. ∙ 0

In a multiple testing framework, we propose a method that identifies the interval with the highest estimated false discovery rate of P-values and rejects the corresponding null hypotheses. Unlike the Benjamini-Hochberg method, which does the same but over intervals with an endpoint at the origin, the new procedure `scans' all intervals. In parallel with storey2004strong, we show that this scan procedure provides strong control of asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate, deriving conditions under which it outperforms the Benjamini-Hochberg procedure. For example, the scan procedure is superior in power-law location models.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multiple testing problems arise in a wide range of applications, and are most acute in contexts where data are large and complex, and where standard data analysis pipelines involve performing a large number of tests. Benjamini and Hochberg (1995) proposed to control the false discovery rate (FDR) as a much less conservative criterion than the family-wise error rate (FWER). They also proposed a method (referred to as the BH method henceforth) for achieving this under some conditions, such as independence of the P-values. Since then, FDR controlling methods have been proposed and in turn adopted by practitioners faced with large-scale testing problems. Although a number of variants have been proposed, most of these methods are also based on computing a threshold based on the P-values and rejecting the null hypotheses corresponding to P-values below that threshold (Genovese and Wasserman, 2004; Storey, 2002; Storey et al., 2004). See Roquain (2011) for a survey.

A threshold approach to multiple testing is natural stemming from the fact that the smaller a P-value is, the more evidence it provides against the null hypothesis being tested. However, we argue that this is not so obvious in the context of multiple testing, particularly in harder cases where the alternatives are not easily identified and in which most of the the smallest P-values come from true null hypotheses. This was already understood by

Chi (2007), who proposed a complex method which may be roughly described as applying the BH method at multiple locations in the unit interval, each location playing the role of the origin. The result is a rejection region111 In context of multiple testing, a rejection region is a subset of the unit interval which identifies the P-values whose null hypotheses are to be rejected. made of possibly multiple intervals.

In the present paper we propose a simpler approach based on the longest interval whose estimated FDR is below the prescribed level. Compared to (Chi, 2007), the method is simpler and is already shown to outperform the BH method in some settings of potential interest, such as in power-law location models. The method is simple and intuitive, and can be seen as a direct extension of the approach of Storey (2002). It thus presents a sort of minimal working example where looking beyond threshold methods can be beneficial.

Scanning over intervals is a common procedure for detecting areas of interest in a point process at least since the work of Naus (1965). In this context, and its extension to discrete signals, the main task has been to test for homogeneity, and some articles have tackled such situations from a multiple testing angle (Siegmund et al., 2011; Picard et al., 2017; Benjamini and Heller, 2007; Caldas de Castro and Singer, 2006; Pacifico et al., 2007; Perone Pacifico et al., 2004). While these papers aim at controlling the FDR when scanning spatiotemporal data, here we consider a standard multiple testing situation with a priori no spatiotemporal structure, and offer scanning as a way to generalize and potentially improve upon threshold procedures.

1.1 Framework

We consider a setting where we test null hypotheses, denoted by . The test for yields a P-value, denoted as , and we assume (for simplicity) that these P-values are independent. In this context, a multiple testing procedure takes the P-values, , and returns a subset of indices representing the null hypotheses that the procedure rejects. Table 1 describes the outcome when applying some significance rule in such a setting and defines some necessary notations. We will let index the true null hypotheses.

Given such a procedure , the false discovery rate is defined as the expected value of the false discovery proportion (Benjamini and Hochberg, 1995), namely

While the FDR of a multiple testing procedure is analogous to the type I error rate of a test procedure, the false non-discovery rate (FNR) plays the role of type II error rate and is here defined as the expected value of the false non-discovery proportion

222 This definition is different from that of Genovese and Wasserman (2002)., namely

Note that this definition is different that than introduced in Genovese and Wasserman (2002), although there is no substantial difference.

accept null reject null total
null true
null false
Table 1: This table summarizes the outcome of applying a multiple testing procedure to a particular situation involving null hypotheses.

1.2 Threshold procedures

Threshold procedures are of the form


where is some (measurable) function with values in . As we stated earlier, most multiple testing procedures are of this form, including the BH method. Specifically, following Storey (2002), we may describe the BH method as follows. For , define the following quantities (see Table 1),


as well as

This is the FDR of the procedure with rejection region . It is estimated by replacing by , justified by the fact that

. (The first inequality is an equality when all the null P-values are uniformly distributed in

.) This yields

and the BH method may be defined via the threshold,

if it is desired to control the FDR at .

1.3 Scan procedures

Effectively, threshold procedures examine intervals of the form , where . We extend this family of procedures by considering all possible intervals, thus defining scan procedures as those of the form


where and are some (measurable) functions with values in and such that pointwise. Within this family of procedures, we define a specific procedure in analogy with the definition of the BH method given above.

For , define the following quantities (see Table 1),


as well as

This is the FDR of the procedure with rejection region , which we estimate by replacing with , which bounds its expectation, obtaining

and our scan procedure is defined via the interval


assuming, again, that we desire to control the FDR at . If there are several maximizing intervals, we choose the left-most interval.

Remark 1.

By construction, relying on basic properties of the function , we have that and correspond to P-values, and


1.4 Contribution and contents

In this paper, following (Storey, 2002; Storey et al., 2004; Genovese and Wasserman, 2002), we consider an asymptotic setting where the scan procedure just defined is indeed able to control the FDR as desired. In the same framework, we also compare, in terms of FNR, the scan procedure with BH procedure, showing that the former is superior to the latter under some specific circumstances, including in power-law location models.

The rest of the paper is organized as follows. In Section 2 we consider our scan procedure’s ability to control the FDR. This is established in an asymptotic setting. In Section 3 we analyze the asymptotic FNR of our scan procedure and compare it with that of the BH procedure. In particular, we derive sufficient conditions under which the scan procedure outperforms the BH procedure. We present the results of numerical experiments in Section 4. Section 5 is a brief discussion section. All proofs are gathered in Section 6.

2 False discovery rate

In this section we examine how the scan procedure defined in Section 1.3 is able to control the false discovery rate (FDR). We start with the following result, which shows that is a conservative point estimate of under any configurations as long as those null hypotheses are uniformly distributed.

Theorem 1.

Suppose the P-values corresponding to true null hypotheses are uniformly distributed in . Then, for any fixed ,

Large scale multiple testing appears in many areas of applications, where is typically of the order of tens or hundreds of thousand. This has led to the consideration of an asymptotic setting where tends to infinity (Storey, 2002; Storey et al., 2004; Genovese and Wasserman, 2002). In detail, the asymptotic framework we consider requires the almost sure pointwise convergence of the empirical distribution of the null P-values and of the empirical distribution of the non-null P-values, or in formula,


almost surely for any fixed , where is a continuous distribution function on the real line. We assume in addition that the following limit exists,


For the remaining results, we assume that Conditions (5)-(6) hold. Note that these conditions were also assumed in (Storey et al., 2004).

Remark 2.

This asymptotic framework generalizes the Bayesian model where the null hypotheses are true with probability

and not true with probability , and the null P-values are uniform in and the non-null P-values are -distributed, corresponding to a mixture model where the P-values are iid with distribution function .


which is the pointwise (almost sure) limit of under the above assumptions. Our next result shows that the scan procedure controls the FDR asymptotically. Here, and everywhere else in the paper, will denote the level at which the FDR is to be controlled. We make the dependency of on explicit, but note that other quantities, such as , also depend on .

Theorem 2.

We have

Remark 3.

In our notation, is random and different from . The latter is the FDR of the scan procedure with rejection region .

We consider the maximization in (3), but based on . Indeed, let be the set of maximizers and the value of the following optimization problem

or, equivalently,

where .

We assume that and that there is such that strictly decreasing at or that strictly increasing at .

The strict monotonicity condition is true, for example, if is concave on , or more generally if is differentiable as satisfies at some .

Theorem 3.

If (7) holds, then, almost surely, any accumulation point of belongs to .

Remark 4.

This result is analogous to Theorem 1 in (Genovese and Wasserman, 2002), which establishes a similar limit for the BH method under similar conditions. Specifically, they show that, almost surely, converges to


with . Alternatively, may also be defined as the right-most solution to the equation .

3 False non-discovery rate

Having established that the scan procedure asymptotically controls the FDR at the desired level, we now turn to examining its false non-discovery rate (FNR). We do so under the same asymptotic framework.

Theorem 4.

If (7) holds, then

As could be anticipated from Theorem 3, the limiting value is the asymptotic FNR of any deterministic rule given by an interval with .

Remark 5.

This result is analogous to Theorem 3 in (Genovese and Wasserman, 2002), which establishes a similar limit for the BH method, specifically,

We now turn our attention to comparing the scan method and the BH method. The following theorem provides some sufficient conditions under which the scan procedure outperforms the BH procedure.

Theorem 5.

Assume that is differentiable. If (7) holds, and in addition


then the scan procedure has strictly smaller asymptotic FNR than the BH procedure.

The BH procedure is known to be optimal in various ways under generalized Gaussian location models (Arias-Castro and Chen, 2016; Rabinovich et al., 2017). We therefore consider power-law location models. More specifically, we consider a mixture model where


where is a continuous distribution on the real line and

. These are meant to represent the test statistics, whose large values weigh against their respective null hypotheses. In particular,

is the null distribution and is the effect size. The P-values are then computed as usual, meaning where , and are seen to follow a mixture model

Theorem 6.

Consider a mixture model (10) in the asymptotic defined by (6). Then the condition (5) holds. Assume in addition that has a density which can be taken to be strictly positive everywhere and such that as and as for some and some . Then there is (depending on and ) such that (9) holds for all .

Remark 6.

The result does not say anything about (7), which is also required in Theorem 4, but this condition is fulfilled except in pathological cases.

4 Numerical experiments

In this section, we perform simple simulations to see the performance of the BH and scan procedures on finite data. We consider the normal and Cauchy mixture models, as in (10).

In the set of experiments, the sample size . We draw observations from the alternative distribution , and the other from the null distribution . Each situation is repeated 100 times and we report the average FDP and FNP for each procedure together with error bars. The FDR control level was set at .

4.1 Normal model

In this model

is the standard normal distribution. We set

and . See Figure 1, where we have plotted , the P-value distribution under the alternative defined in (11). This is a situation where is concave, so we expect the two methods to behave similarly. This is confirmed numerically. In fact, the scan procedure was observed, in these experiments, to coincide with the BH method. (This does not happen at smaller signal-to-noise ratios, e.g., when is smaller.) See Figure 2, where we have plotted the FDP and FNP of both procedures.

Figure 1: The alternative P-value distribution in the normal mixture model with and (solid black) and the line (dashed black).
Figure 2: FDP and FNP for the BH (red) and scan (blue) methods under normal mixture model. The methods are essentially identical. The FDR control was set at .

4.2 Cauchy model

In this model

is the Cauchy distribution. We set

and . This choice of parameters leads to a model that satisfies the condition (9) in Theorem 5. See Figure 3 for an illustration. Therefore, here we expect the scan procedure to outperform the BH procedure. This is confirmed in the numerical experiments. See Figure 4, where we have plotted the FDP and FNP of both procedures.

Figure 3: The alternative P-value distribution in the Cauchy mixture model with and (solid black) and the line (dasned black).
Figure 4: FDP and FNP for the BH (red) and scan (blue) methods under Cauchy mixture model. The methods are essentially identical. The FDR control was set at .

5 Discussion

Genovese and Wasserman (2002) argue that the BH method is not optimal among threshold procedures due to its being conservative in terms of FDR control. We expect the same to be true of our scan procedure. We could have pursued an improvement analogous to how the BH method was ameliorated in (Storey, 2002; Benjamini and Hochberg, 2000) based on estimating the number of true null hypotheses ( in our notation), but we chose not to do so for the sake of simplicity and focus.

We also want to mention that the present situation, where a scan method is found to improve upon a threshold method, has a parallel in the test of the global null hypothesis. Indeed, continuing with the line of work coming out of (Ingster, 1997; Donoho and Jin, 2004), we have recently considered the problem of detecting a sparse mixture and shown that threshold tests are inferior to scan tests in power-law models, although in a somewhat different asymptotic regime (Arias-Castro and Ying, 2018).

6 Proofs

We prove our results in this section.

6.1 Proof of Theorem 1

For any ,


where (16) follows from Jensen’s inequality, based on the fact that is convex over , while (17) uses the fact that P-values are independent and uniform in under their null. The fact that the very last expression is non-negative comes from the fact that .

6.2 Some preliminaries

Henceforth, we assume that (5) and (6). Before proving our main results, we establish a few auxiliary lemmas.

Lemma 1.

For any fixed , almost surely,


As is well-known, the pointwise convergences that we assume, namely (5), imply uniform convergences, so that, together with (6), we have


almost surely. Combining these also yields


When , we have

and it is thus straightforward to show that

almost surely, which establishes our claim. ∎

Lemma 2.

Almost surely,


For the first part, we have

so that we only need to prove that

But this simply comes from (19) and (6). ∎

Lemma 3.

If (7) holds, then, almost surely,


Let be as in (7), with (for example) strictly decreasing at . Then there is such that when . With probability one, converges to , and when this is the case, for sufficiently large, then implying that by definition in (3). Hence, we have shown that for any such , almost surely, and we conclude by letting . (Recall that for any .) ∎

6.3 Proof of Theorem 2

For the first part, using Lemma 2, we have

almost surely, and we conclude with (4).

The second part just follows from the first part and Fatou’s lemma.

6.4 Proof of Theorem 3

With probability one, a realization satisfies (18) with , and (22). Consider such a realization and let be an accumulation point of .

Because (22) holds, we have .

We also have , eventually, and because (18) holds, this implies that

Together with (4), we thus have

By continuity of , this implies that , in turn implying that .

We have thus established that satisfies and , and therefore belongs to by definition.

6.5 Proof of Theorem 4

By definition of in Table 1, we have

By (20) and Theorem 3 together with the fact that for any , almost surely,

We thus have, almost surely,

and we conclude using the Dominated Convergence theorem.

6.6 Proof of Theorem 5

Adapting the proof of Theorem 4, we can establish an analogous result for the BH method, specifically,


almost surely, where was defined in (8). Therefore, to compare the asymptotic FNR of the scan and the BH procedures, we need to compare and .

Define , , and . Apparently, we have . By the fact that is the right-most solution to , we have . Hence, . This, coupled with (9), implies that , so that .

Let denote the line with slope passing through the origin, and for , let denote the line parallel to at a distance below . See Figure 5 for an illustration. Because and , and by continuity of , the graph of intersects at least twice. Choosing small enough, it is therefore also the case that intersects at least twice. Let and denote the horizontal coordinates of the leftmost and rightmost intersection points, respectively. Note that as by that fact that , and by the fact that is the right-most solution to .

Figure 5: Example which satisfies Condition 9 in Theorem 5.

Moreover, as , by simple geometry arguments, we have

Since , we have and also . It follows that, for small enough, . Due to the fact, by construction,

we have , by definition of the latter.

6.7 Proof of Theorem 6

The Law of Large Numbers implies that the condition (

5) holds. We thus turn to the remaining of the statement.

We note that is differentiable on , with derivative

As , we have , and by the fact that for all , as , we have that is differentiable at , with derivative .

We also have that, for any fixed , as , due to the fact that as . Hence, as . Let , so that as . Because , we have . Because the right-hand side tends to 0, we must have as . Then using the fact that as , we must have

or equivalently,

as . This is seen to imply that , where . Note that . We then have, as ,

We conclude that, for large enough, .


  • Arias-Castro and Chen (2016) Arias-Castro, E. and S. Chen (2016). Distribution-free multiple testing. arXiv preprint arXiv:1604.07520.
  • Arias-Castro and Ying (2018) Arias-Castro, E. and A. Ying (2018). Detection of sparse mixtures: Higher criticism and scan statistic. arXiv preprint arXiv:1802.08715.
  • Benjamini and Heller (2007) Benjamini, Y. and R. Heller (2007). False discovery rates for spatial signals. Journal of the American Statistical Association 102(480), 1272–1281.
  • Benjamini and Hochberg (1995) Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289–300.
  • Benjamini and Hochberg (2000) Benjamini, Y. and Y. Hochberg (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of educational and Behavioral Statistics 25(1), 60–83.
  • Caldas de Castro and Singer (2006) Caldas de Castro, M. and B. H. Singer (2006). Controlling the false discovery rate: a new application to account for multiple and dependent tests in local statistics of spatial association. Geographical Analysis 38(2), 180–208.
  • Chi (2007) Chi, Z. (2007). On the performance of fdr control: constraints and a partial solution. The Annals of Statistics, 1409–1431.
  • Donoho and Jin (2004) Donoho, D. and J. Jin (2004). Higher criticism for detecting sparse heterogeneous mixtures. The Annals of Statistics 32(3), 962–994.
  • Genovese and Wasserman (2002) Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 499–517.
  • Genovese and Wasserman (2004) Genovese, C. and L. Wasserman (2004). A stochastic process approach to false discovery control. Annals of Statistics, 1035–1061.
  • Ingster (1997) Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Mathematical Methods of Statistics 6(1), 47–69.
  • Naus (1965) Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association 60(310), 532–538.
  • Pacifico et al. (2007) Pacifico, M. P., C. Genovese, I. Verdinelli, and L. Wasserman (2007). Scan clustering: A false discovery approach.

    Journal of Multivariate Analysis

     98(7), 1441–1469.
  • Perone Pacifico et al. (2004) Perone Pacifico, M., C. Genovese, I. Verdinelli, and L. Wasserman (2004). False discovery control for random fields. Journal of the American Statistical Association 99(468), 1002–1014.
  • Picard et al. (2017) Picard, F., P. Reynaud-Bouret, and E. Roquain (2017). Continuous testing for poisson process intensities: A new perspective on scanning statistics. arXiv preprint arXiv:1705.08800.
  • Rabinovich et al. (2017) Rabinovich, M., A. Ramdas, M. I. Jordan, and M. J. Wainwright (2017). Optimal rates and tradeoffs in multiple testing. arXiv preprint arXiv:1705.05391.
  • Roquain (2011) Roquain, E. (2011). Type i error rate control in multiple testing: a survey with proofs. Journal de la Société Française de Statistique 152(2), 3–38.
  • Siegmund et al. (2011) Siegmund, D., N. Zhang, and B. Yakir (2011). False discovery rate for scanning statistics. Biometrika 98(4), 979–985.
  • Storey (2002) Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 479–498.
  • Storey et al. (2004) Storey, J. D., J. E. Taylor, and D. Siegmund (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1), 187–205.