Bound on FWER for correlated normal distribution

08/06/2019 ∙ by Nabaneet Das, et al. ∙ 0

In this paper,our main focus is to obtain an asymptotic bound on the family wise error rate (FWER) for Bonferroni-type procedure in the simultaneous hypotheses testing problem when the observations corresponding to individual hypothesis are correlated. In particular, we have considered the sequence of null hypotheses H_0i : X_i follows N(0,1) , (i=1,2,....,n) and equicorrelated structure of the sequence (X_1,....,X_n). Distribution free bound on FWER under equicorrelated setup can be found in Tong(2014). But the upper bound provided in Tong(2014) is not a bounded quantity as the no. of hypotheses(n) gets larger and larger and as a result,FWER is highly overestimated for the choice of a particular distribution (e.g.- normal). In the equicorrelated normal setup, we have shown that FWER asymptotically is a convex function (as a function of correlation (rho)) and hence an upper bound on the FWER of Bonferroni-(alpha) procedure is alpha(1-ρ).This implies,Bonferroni's method actually controls the FWER at a much smaller level than the desired level of significance under the positively correlated case and necessitates a correlation correction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multiple hypothesis testing has been one of the most lively area of research in statistics for the past few decades. The biggest challenge in this area comes from the fact that the models involve an extensive collection of unknown parameters and one has to draw simultaneous inference on a large number of hypotheses mainly with the goal of ensuring a good overall performance (rather than focusing too much on the individual problems). Very often data sets from modern scientific investigations,in the field of Biology,astronomy,economics etc. require such simultaneous testing on thousands of hypotheses.

Various measures of error rate have been proposed over the years. One of the hard-line frequentist approach is to control the family wise error rate (FWER) which is defined as the probability of making at least one false rejection in a family of hypothesis-testing problem.


Bonferroni’s bound provides the classical FWER control method. However, the step-up and step-down algorithms by [11] , [17],[12] ,[10] provides improvement over the Bonferroni’s method in terms of power. While Holm’s procedure provides control over the FWER in general, the other algorithms depend heavily on the independence of the p-values of the individual hypothesis. [3], [4],[7] provides excellent review of the whole theory.
One of the main limitations of these classical methods that control FWER in the strong sense, is their conservative nature which results in lack of power. A substantial improvement in power has been achieved by considering the False discovery rate (FDR) criterion proposed by [1]. See, for example [2], [20], [18], [16],[13] for further details. [14] ,[7] provides an excellent account on the literature on FDR.
However, most of these works have been done in the context of independent observations. Very little literature can be found that covered correlated variables. [14] reviews FDR control under dependence set up. [6] clearly shows the effects of correlation on the summary statistics by pointing out that the correlation penalty depends on the root mean square (rms) of correlations. An excellent review of the whole work can be found in [7]. All these works gives immense light in the context that FWER or FDR should be treated more carefully where correlation is present.
Some distribution free bound on FWER can be found in [19] using Chebyshev-type inequalities. But as these inequalities are distribution free, FWER is highly overestimated for choice of particular distributions (e.g.- normal). Also, these inequalities are not of much use for a large number of hypotheses.
In our work, we have considered equicorrelated normal distribution and obtained a sharper bound on the FWER for the Bonferroni type FWER control procedure. We have shown that, asymptotically (For large no. of hypotheses) is a convex function in and hence FWER in Bonferroni- procedure is bounded by . This suggests a necessary correlation correction in Bonferroni procedure. While the bound provided in [19] is not a bounded quantity as no. of hypotheses gets larger and larger , the bound provided in this work remains stable even as and shows a clearer picture of the effect of correlation on FWER. This is probably the first attempt in the context of finding most classically used FWER in terms of asymptotically. We, in further communication expect to attempt the same problem in terms of root mean square (rms) of correlations as attempted in [6].

2 Description of the problem

Let be a sequence of observations and the null hypotheses are

Many single hypothesis testing problems focus on the one sided tests which rejects the null hypothesis for large values of the observation. We’ll consider here the one-sided test : - Reject

if . ( c is chosen according to the required significance level of the test).The two-sided hypothesis problem can be solved in the similar manner.

Under this setup a classical measure of the type-I error is FWER which is the probability of falsely rejecting at least one null hypothesis (Which happens if

for some and the probability is computed under the intersection null hypothesis . When we compute under the intersection null hypothesis, FWER and FDR are the same. Hence this study also gives an idea of the behaviour of FDR under this setup.

FWER = P(At least one false rejection)= P( for some )

Suppose, .
Our goal is to provide an asymptotic bound on FWER in terms of .

Let, FWER

3 Main theorem

Theorem 3.1

Suppose each is being tested at size . If then, as and hence asymptotically is a concave function in [0,1] .

Note :-

  1. For (Under independence), we must have , FWER = .

  2. For ( When a.s. ), we must have FWER = (Because one rejection would imply rejection of all null hypotheses).

Suppose denotes the line which joins and .The following corollary describles the asymptotic behaviour of FWER as a function of .

Corollary 3.1.1

As , FWER is bounded above by the line .

In this section we are going to provide a proof of this theorem.

3.1 An alternate form of and it’s derivatives

Under the framework described above, we can say that under ,the sequence is exchangeable. (i.e. (Where is a matrix of ones).
Then, .
Where

is a mean 0, normal random variable, independent of the sequence

and ’s are i.i.d. normal random variables.
Since this implies,
and

Thus,

.
(Where Z and is the c.d.f. of N(0,1) distribution)

If we define , , then .

Now, an application of dominated convergence theorem would yield,

(1)

( where and is the N(0,1) p.d.f. )
And again by D.C.T. ,

(2)

Where, and .

Let’s define, .
Then, note that,

4 Proof of the main theorem

The proof of the main theorem involves three steps.

  • Step 1 :- The 2nd and 3rd term in as . (Proof is given in appendix (lemma I)).

  • Step 2 :- Suppose at . If then, the first term is asymptotically .

  • Step 3 :- If then the first term as .

Proof of step 1 is given in appendix (lemma I). We shall proceed with the proof of step 2.

4.1 Proof of step 2

We know that, for large enough .
as (Because as ).
So, .
By lemma II in appendix , we can say that,
.




( Where )

Here we have assumed that .
First of all note that,
Suppose,
We have, and .
Since for large enough n, this means , for large n.
Also, observe that, is a decreasing function of z. Since , we can say that, .
This means, .
Since , this means
And thus, which gives us the desired result.
This completes the proof of step 2.

4.2 Proof of step 3

In this part we have assumed that . The whole real line is broken in three disjoint regions and . We’ll show that the integral in these three regions separately as .

  • Case 1 : -
    If , then .
    Using the fact that, , we can say that, .
    So ,

    Each is being tested at size and we reject if .
    So, .
    As by the condition .
    For large , we have
    So, ……………….(i)
    Now, observe that,
    and .
    So, as .

  • Case 2 :-
    For all , we must have .
    This implies,
    . So, for large , .
    This implies, in this case, for large .
    Using the fact note that,
    .
    From the previous observation (in case 1) we can say that,
    Notation :- if and such that,
    Note that, . (By (i))
    This implies, .
    Thus,

  • Case 3 : - .

    If then . (This means takes a very large negative value.
    Note that, ( A polynomial in and ).
    Since , this implies, if
    Also, note that,
    is bounded.
    And hence,

    Now we need to consider the region .
    Suppose and .

    For large n, for .


    Since

    Since this implies, and hence as .
    So,

    Note that, ( as )
    Since and as , by continuity of we can say that, and hence, as .
    .
    So,
    and hence .

This completes the proof of main theorem.

5 Conclusion

From the theorem in the previous section, we have asymptotically.
as This means for large n, FWER is bounded by in [0,1]. From this, we can conclude that,

Theorem 5.1

For large , FWER .

For large n, and this implies, FWER .

Bonferroni’s method suggests us to take if we want to maintain - FWER level. This satisfies the criterion of the main theorem of section 2.
When , then .

This implies, the FWER of the Bonferroni’s procedure is bounded by .

6 Appendix

6.1 Lemma I

Lemma 6.1

The second and third term in as

Proof :- We shall do it only for the third term. The other one follows similarly. This proof is similar to the proof of case 2 ( ) of step 3 of the main theorem.
Third term in is .
Let, .
Then,
Since for large enough , so

. Also, if then and hence

. Let, and .
Similar to the case 2 ( ) of step 3 of the main theorem we can say that, as .
If then as .
Thus, and the third term in .

6.2 Lemma II

Lemma 6.2

as .

Proof :-
.
If then and hence, .
Since , this immediately implies that,

When for a constant .
This implies, .

A similar idea to the proof of lemma-I will tell us that, we need to consider the region corresponding to and the integral corresponding to this region follows exactly in the similar way as lemma - I.

7 Simulation results

Theorem 5.1 tells us for large n, FWER for Bonferroni’s method (with level of significance ) is asymptotically bounded above by .In order to verify this result empirically,some simulation results have been provided in table 1. In our simulation experiments,we have considered and .In each combination of ,

replications have been made to estimate the FWER (the estimate obtained is denoted by

).In each replication, we have generated

equicorrelated normal random variables each with mean 0 and variance 1.Bonferroni’s method suggests us to reject

at level if

-th quantile of N(0,1) distribution. In each replication we have to note whether or not any of the

’s exceeds that cut-off and then is obtained accordingly from the replications.
Each obtained at the combination is compared with (the upper bound mentioned in section 5).It is impressive that in all the cases is substantially smaller than (except at although the difference is not noteworthy).All these observations suggest that in positively correlated setup,Bonferroni’s method actually controls the FWER at a much smaller level than the desired level of significance which makes this method more conservative in this case.

0.01 0.05 0.1 0.4 0.6 0.7
0.9 9.00E-05 0.00046 0.00053 0.00221 0.00324 0.0031
1.00E-03 0.005 0.01 0.04 0.06 0.07
0.7 0.00101 0.00363 0.00588 0.01617 0.02149 0.023
0.003 0.015 0.03 0.12 0.18 0.21
0.5 0.00347 0.01156 0.01918 0.04909 0.06414 0.07042
0.005 0.025 0.05 0.2 0.3 0.35
0.3 0.00683 0.02523 0.04363 0.11495 0.15013 0.16494
0.007 0.035 0.07 0.28 0.42 0.49
0.1 0.00996 0.04367 0.07978 0.23801 0.31105 0.34295
0.009 0.045 0.09 0.36 0.54 0.63
0 0.01018 0.0486 0.09424 0.32914 0.45065 0.50499
0.01 0.05 0.1 0.4 0.6 0.7
Table 1: Simulation results

References

  • [1] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  • [2] Yoav Benjamini and Wei Liu. A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of statistical planning and inference, 82(1-2):163–170, 1999.
  • [3] Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C Boldrick, et al. Multiple hypothesis testing in microarray experiments. Statistical Science, 18(1):71–103, 2003.
  • [4] Sandrine Dudoit and Mark J Van Der Laan. Multiple testing procedures with applications to genomics. Springer Science & Business Media, 2007.
  • [5] Bradley Efron. Correlation and large-scale simultaneous significance testing. Journal of the American Statistical Association, 102(477):93–103, 2007.
  • [6] Bradley Efron. Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association, 105(491):1042–1055, 2010.
  • [7] Bradley Efron.

    Large-scale inference: empirical Bayes methods for estimation, testing, and prediction

    , volume 1.
    Cambridge University Press, 2012.
  • [8] Bradley Efron et al. Size, power and false discovery rates. The Annals of Statistics, 35(4):1351–1377, 2007.
  • [9] Bradley Efron, Robert Tibshirani, John D Storey, and Virginia Tusher. Empirical bayes analysis of a microarray experiment. Journal of the American statistical association, 96(456):1151–1160, 2001.
  • [10] Yosef Hochberg. A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75(4):800–802, 1988.
  • [11] Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65–70, 1979.
  • [12] Gerhard Hommel. A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika, 75(2):383–386, 1988.
  • [13] Joseph P Romano, Azeem M Shaikh, and Michael Wolf. Control of the false discovery rate under dependence using the bootstrap and subsampling. Test, 17(3):417, 2008.
  • [14] Sanat K Sarkar. On methods controlling the false discovery rate. Sankhyā: The Indian Journal of Statistics, Series A (2008-), pages 135–168, 2008.
  • [15] Sanat K Sarkar et al. Some results on false discovery rate in stepwise multiple testing procedures. The Annals of Statistics, 30(1):239–257, 2002.
  • [16] Sanat K Sarkar et al. Stepup procedures controlling generalized fwer and generalized fdr. The Annals of Statistics, 35(6):2405–2420, 2007.
  • [17] R John Simes. An improved bonferroni procedure for multiple tests of significance. Biometrika, 73(3):751–754, 1986.
  • [18] John D Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):479–498, 2002.
  • [19] Yung Liang Tong. Probability inequalities in multivariate distributions. Academic Press, 2014.
  • [20] Daniel Yekutieli and Yoav Benjamini.

    Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics.

    Journal of Statistical Planning and Inference, 82(1-2):171–196, 1999.