Modern scientific studies aided by high-throughput technologies, such as those related to brain imaging, microarray analysis, astronomy, atmospheric science, drug discovery, and many others, are increasingly relying on large-scale multiple testing as an integral part of statistical investigations focused on high-dimensional inference. With many of these investigations, notably in genome-wide association and neuroimaging studies, giving rise to testing of hypotheses that appear in groups, the multiple testing paradigm seems to be shifting from single to multiple groups of hypotheses. These groups, forming at single or multiple levels creating one- or multi-way classified hypotheses, can occur naturally due to the underlying biological or experimental process or be created using internal or external information capturing certain specific features of the data. Several newer questions arise with this paradigm shift. However, we will focus on the following two questions that seem relatively more relevant in light of what is available in the literature in the context of controlling an overall measure of false discoveries across the entire collection of hypotheses:
For multiple testing of hypotheses grouped into a one-way classified form, how to effectively capture the underlying group/classification structure, instead of simply pooling all the hypotheses into a single group, while controlling overall false discoveries across all individual hypotheses?
For hypotheses grouped into a one-way classified form in the context of post-selective inference where groups are selected before testing the hypotheses in the selected groups, how to effectively capture the underlying group/classification structure to control the expected average of false discovery proportions across the selected groups?
Progress has been made toward answering Q1 (Hu et al. (2010)) and Q2 (Benjamini & Bogomolov (2014)) for one-way classified hypotheses in the framework of Benjamini-Hochberg (Benjamini & Hochberg (1995)) type false discovery rate (FDR) control. However, research addressing these questions based on local false discovery rate (Lfdr) (Efron et al. (2001)) based methodologies are largely absent, excepting the recent work of Liu et al. (2016) where a method has been proposed in its oracle form to answer the following question related to Q1: When making important discoveries within each group is as important as making those discoveries across all hypotheses, how to maintain a control over falsely discovered hypotheses within each group while controlling it across all hypotheses?
The fact that an Lfdr based approach with its Bayesian/empirical Bayesian and decision theoretic foundation can yield powerful multiple testing method controlling false discoveries effectively capturing dependence as well as other structures of the data in single- and multiple-group settings has been demonstrated before (Sun et al. (2006); Sun & Cai (2007); Efron (2008); Ferkingstad et al. (2008); Sarkar et al. (2008); Sun & Cai (2009); Cai & Sun (2009); Hu et al. (2010); Zablocki et al. (2014); Ignatiadis et al. (2016)). However, the work of Liu et al. (2016) is fundamentally different from these works in that it takes into account the sparsity of signals both across groups and within each active group. Consequently, the effect of a group’s significance in terms of its Lfdr can be explicitly factored into a significance measure of each hypothesis within that group. On the other hand, in those other works, such as Sun & Cai (2009); Hu et al. (2010), significance measure of each hypothesis within a group is adjusted for the group’s effect through its size rather than its measure of significance.
In this article, we continue the line of research initiated in Liu et al. (2016) to answer Q1 and Q2 in an Lfdr framework. More specifically, we borrow ideas from Liu et al. (2016) in developing methodological steps to present a unified group-adjusted multiple testing framework for one-way classified hypotheses that introduces a grouping effect into overall false discoveries across all individual hypotheses or the average of within-group false discovery proportions across selected groups.
In the next section, we present the current state of knowledge closely pertinent to the present work and make remarks motivating the development of our proposed methodologies.
2 Literature Review and Motivating Remark
Suppose there are null hypotheses that appear in non-overlapping families/groups, with being the th hypothesis in the th group (). We refer to such a layout of hypotheses as one-way classified hypotheses.
With indicating the truth () or falsity () of
, the Lfdr, defined by the posterior probability, where , is the basic ingredient for constructing Lfdr based approaches controlling false discoveries. The single-group case (or the case ignoring the group structure) has been considered extensively in the literature, notably Sun & Cai (2007); Cai & Sun (2009) and He et al. (2015) who focused on constructing methods that are optimal, at least in their oracle forms. These oracle methods correspond to Bayes multiple decision rules under a single-group two-class mixture model (Efron et al. (2001); Newton et al. (2004); Storey (2002)) that minimize marginal false non-discovery rate (mFNR), a measure of false non-discoveries closely related to the notion of false non-discoveries (FNR) introduced in Genovese & Wasserman (2002) and Sarkar (2004), subject to controlling marginal false discovery rate (mFDR), a measure of false discoveries closely related to the BH FDR and the positive FDR (pFDR) of Storey (2002). Multiple-group versions of single-group Lfdr based approaches to multiple testing have started getting attention recently, among them the following seem more relevant to our work.
Cai & Sun (2009) extended their work from single to multiple groups (one-way classified hypotheses) under the following model: With taking the value
with some prior probability, , , given , are assumed to be iid random pairs with
for some given densities and , and . They developed a method, which in its oracle form minimizes mFNR subject to controlling mFDR and is defined in terms of thresholding the conditional Lfdr’s: CLfdr, where , for , , before proposing a data-driven version of the oracle method that asymptoticaly maintains the original oracle properties. It should be noted that the probability relates to the size of group and provides little information about the significance of the group itself. Ferkingstad et al. (2008) brought the grouped hypotheses setting into testing a single family of hypotheses in an attempt to empower typical Lfdr based thresholding approach by leveraging an external covariate. They partitioned the
-values into a number of small bins (groups) according to ordered values of the covariate. With the underlying two-class mixture model defined separately for each bin depending on the corresponding value of the covariate, they defined the so called covariate-modulated Lfdr as the posterior probability of a null hypothesis given the value of the covariate for the corresponding bin. They estimated the covariate-modulated Lfdr in each bin using a Bayesian approach before proposing their thresholding method, not necessarily controlling an overall measure of false discoveries such as the mFDR or the posterior FDR. An extension of this work from single to multiple covariates can be seen inZablocki et al. (2014); Scott et al. (2015). Very recently, Cai et al. (2016)
developed a novel grouped hypotheses testing framework for two-sample multiple testing of the differences between two highly sparsed mean vectors, having constructed the groups to extract sparisty information in the data by using a carefully constructed auxiliary covariate. They proposed an Lfdr based optimal multiple testing procedure controlling FDR as a powerful alternative to standard procedures based on the sample mean differences.
A sudden upsurge of research has taken place recently in selective/post-selection inference due to its importance in light of the realization by the scientific community that the lack of reproducibility of a scientist’s work is often caused by his/her failure to account for selection bias. When multiple hypotheses are simultaneously tested in a selective inference setting, it gives rise to a grouped hypotheses testing framework with the tested groups being selected from a given set of groups of hypotheses. Benjamini & Bogomolov (2014) introduced the notion of the expected average of false discovery proportion across the selected groups as an appropriate error rate to control in this setting and proposed a method that controls it. Since then, a few papers have been written in this area (Peterson et al. (2016a) and Heller et al. (2017)); however, no research has been produced yet in the Lfdr framework.
When grouping of hypotheses occurs, naturally or artificially, an assumption can be made that the significance of a hypothesis is influenced by that of the group it belongs to. The Lfdr under the standard two-class mixture model, however, does not help in assessing a group’s influence on true significance of its hypotheses. This has been the main motivation behind the work of Liu et al. (2016), who considered a group-adjusted two-class mixture model that yields an explicit representation of each hypothesis-specific Lfdr as a function of its group-adjusted form and the Lfdr for the group it is associated with. It allows them to produce a method that provides a separate control over within-group false discoveries for truly significant groups in addition to having a control of false discoveries across all individual hypotheses. This paper, as mentioned in Introduction, motivates us to proceed further with the development of newer Lfdr based multiple testing methods for one-way classified hypotheses as described in the following section.
3 Proposed Methodologies
Let us define to let (or ) mean that the th group, and hence each (or at least one) of its component hypotheses, is non-significant (or significant). Let indicate the truth or falsity of . We express each as follows: , with indicating the truth or falsity of conditional on the status of , i.e., , if ; and or according to whether or , if . This representation of the ’s brings the underlying group structure of the hypotheses into their binary hidden states conditional on the binary hidden states of the groups containing them.
Let us now recall from Liu et al. (2016) the model, with a different name, extending the two-class mixture model (Efron et al., 2001) from single to multiple groups under the setting of one-way classified hypotheses. The following distribution introduced in Liu et al. (2016) with a different name plays an important role in this model:
When hypotheses belonging to a certain group/family are simultaneously tested, this distribution provides a natural adjustment of the commonly used product Bernoulli distribution for the set of binary hidden states of the hypotheses, conditional on the group/family itself being significant.
[Group-Adjusted Two-Class Mixture Model for One-Way Classified Hypotheses (One-Way GAMM)]. Let be the set of random variables associated with the
be the set of random variables associated with theth group, for . The groups are independently distributed with the following model for group :
be the local FDRs corresponding to (hypothesis), (group), and given (conditional), respectively, under One-Way GAMM. It is easy to see that
showing how a hypothesis specific local FDR factors into the loacl FDRs for the group and for the hypothesis conditional on the group’s significance.
Let , with , and . Then, as shown in Appendix,
When , reduces to , and so One-Way GAMM with for all represents the case of ‘no group effect’.
These results can be summarised in the following:
Let be the local FDR associated with in group under the standard single-group two-class mixture model with being the probability of a hypotheses in the group being significant, and be the same under One-Way GAMM that incorporates a similar two-class mixture model across the groups with as the chance of a group being significant. Then, can be expressed in terms of and as follows by making use of (3.1)-(3.3), with measuring an effect due to grouping for group :
for each .
The above results bring home the point that in an Lfdr based approach to testing hypotheses belonging to a group/family that itself is likely to be significant with a chance of its own, the Lfdr for the group should be separated out from that for each hypothesis before assessing the true significance of the hypothesis.
More specifically, suppose that we have a single group (i.e., ) of hypotheses to test. Then, the hypotheses should be tested by taking away from them the confounding effect of the group’s significance by using Lfdr or the cumulative averages of them, depending on whether one desires to control the local FDR or the average local FDR (when controlling posterior FDR). Of course, one should test the significance of the group using its local FDR, , before proceeding to test the hypotheses in it at a level depending on that for . More specifically, if one wants to control the average local FDR, say at , then we propose to reject the hypotheses associated with Lfdr, , the first increasingly ordered values of Lfdr, where is such that
The equals if the group is assumed to be significant, or it can be controlled at some pre-assigned level to check if the group is significant. Clearly, when , our proposal reduces to controlling the average local FDR for a single group of hypotheses under the standard two-class mixture model without introducing any group effect. We will extend this proposal from single to multiple groups of hypotheses in the following.
We express , the decision rule associated with , similarly to , as follows: , with and being the decision rules for and , respectively. This provides a two-stage approach to deciding between and simultaneously for all . This paper relates to the development of such two-stage approaches, but focused on controlling the posterior expected proportion of false discoveries across all hypotheses, referred to as the total posterior FDR (PFDR), or the posterior expected average false discovery proportion across the selected/signficant groups, referred to as the selective posterior FDR (PFDR), at a given level . In other words, we consider determining (), , satisfying
where is the set of indices for the selected groups, with the expectations taken with respect to ’s conditional on .
For notational convenience, we will often hide the symbol in the ’s. Using (3.1), we see that PFDR and PFDR simplify, respectively, to
where , and is the within-group posterior FDR for group .
The above representations of PFDR and PFDR under One-Way GAMM provide a Group Adjusted TEesting (GATE) framework for one-way classified hypotheses using their local FDRs, allowing us to produce algorithm (in their oracle forms) answering each of Q1 and Q2. We commonly refer to these algorithms as One-Way GATE algorithms.
3.1 Answering Q1
Before we present an algorithm in its oracle form answering Q1, it is important to note the following theorem that drives the development of it with some optimality property.
denote the total posterior FNR (PFNR) of a decision rule . The PFNR of the decision rule with , for satisfying , is always less than or equal to that of any other with .
A proof of this theorem can be seen in Appendix.
The oracle One-Way GATE 1 controls PFDR at .
This theorem can be proved using standard arguments used for Lfdr based approaches to testing single group of hypotheses (see, e.g., Sun & Cai (2007); Sarkar & Zhou (2008)). It is important to note that may not equal a pre-specified value of , and so Algorithm 1 is generally sub-optimal in the sense that it is the closest to one that is optimal as stated in Theorem 1.
When for all , i.e., when the underlying grouping of hypotheses is ineffective in the sense that a group’s own chance of being significant is no different from when it is formed by combining a set of independent hypotheses, One-Way GATE 1 reduces to the standard Lfdr based approach (like that in Sun & Cai (2007); He et al. (2015); and in many others). As we will see from simulation studies in Section 4, with increasing (or decreasing) from , i.e., when a group’s chance of being significant gets larger (or smaller) than what it is if the group consists of independent hypotheses, the standard Lfdr based approach becomes less powerful (or fails to control the error rate).
3.2 Answering Q2
There are applications in the context of selective inference of multiple groups/familes of hypotheses where discovering significant groups, and hence a control over a measure of their false discoveries, is scientifically no less meaningful than making such discoveries for individual hypotheses subject to a control over a similar measure of false discoveries across all of them. For instance, as Peterson et al. (2016b) noted, in a multiphenotype genome-wide association study, which is often focused on groups/families of all phenotype specific hypotheses related to different genetic variants, rejecting corresponding to variant is considered an important discovery in the process of identifying phenotypes that are significantly associated with that variant. They borrowed ideas from Benjamini & Bogomolov (2014) and considered a hierarchical testing method that allows control of this so-called between-group FDR in the process of controlling the expected average of false discovery proportions across significant groups (due to Benjamini & Bogomolov (2014)).
The following algorithm in its oracle form answering Q2 offers an Lfdr based alternative to the hierarchical testing method of Peterson et al. (2016b). It allows a control over
an Lfdr analog of the aforementioned between-group FDR for the selected groups, while controlling PFDR.
The following notation is being used in this algorithm: For , , with , , being the sorted values of the Lfdr’s in group .
The oracle One-Way GATE 2 controls PFDR at subject to a control over PFDR at .
This theorem can be proved by noting that the left-hand side of (3.11) is the PFDR of the procedure produced by Algorithm 2.
denote between-group posterior FNR and within-group posterior FNR for group , respectively, for a decision rule of the form , with and , for some , .
From Theorem 3.1, we have the following optimality result regarding One-Way GATE 2: Given any ,
(i) the PFNR of the decision rule of the form with satisfying is less than or equal to that of any other with .
(ii) Given , , with , there exists an , subject to , such that, for each , of the decision rule of the form with satisfying is less than or equal to that of any other decision rule in that group for which .
It is important to note that One-Way GATE 2 without Step 1 can be used in situations where the focus is on controlling PFDR given a selection rule (or ).
4 Numerical Studies
This section presents results of numerical studies we conducted to examine the performances of One-Way GATE 1 and One-Way GATE 2 compared to their relevant competitors in their oracle forms.
4.1 One-Way GATE 1
We considered various simulation settings involving 10,000 or 100,000 hypotheses grouped into equal-sized groups to investigate the performance of One-Way GATE 1 in comparison with its three competitors, all in their oracle forms. The first competitor, named as oracle Naive Method, ignores the group structure by pooling all the hypotheses together into a single group, while the other two are oracle SC (Sun & Cai (2009)) and oracle GBH (Hu et al. (2010)) methods. They operate under our model setting with equal group size as follows: Oracle Naive Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the hypotheses pooled together into a single group under a two-class mixture model , with , where and . Oracle SC Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the hypotheses pooled together into a single group assuming a two-class mixture model for the hypotheses in group , for each . Oracle GBH Method: is converted to its -value before a level BH method is applied to the weighted -values , , for the hypotheses pooled together into a single group.
The simulations involved independently generated triplets of observations , or ); or ), with (i) ; (ii) ’s jointly following TPBern(), with determined from (3.4) using for , , , or ; and (iii) if , and if , where or or or or .
The oracle versions of One-Way GATE 1, the Naive Method, SC method, and GBH method were applied to the data for testing against simultaneously for all at , and the simulated values of the total false discovery rate, the average number of true rejections, and the average number of total rejections were obtained for each of them based on 1000 replications.
Figures 1-3 and 6-14 display how the four methods compare across different values of (or ) and as the group size changes from small to a large value. The first three of these figures are being used here to point out scenarios where One-Way GATE 1 is seen to perform better than its competitors when . The rest of these graphs for larger values of are put in Appendix to see if the comparative performance pattern among the four methods changes with increasing value of .
Figures 1-3 show that oracle One-Way GATE 1 controls the FDR at the desired level 0.05 well. The oracle Naive Method also controls the FDR at the desired level. However, it is seen to be less powerful than oracle One-Way GATE 1, as expected, with the power difference getting larger with increased group size.
The superior performance of oracle One-Way GATE 1 over oracle SC method when is clearly shown by these graphs. The oracle SC method fails to control the FDR, with the resultant FDR getting as large as 0.47, when . This happens because it uses a larger value of when is small, inflating the FDR by an amount relating to the value of . When is larger, it uses a smaller value of , resulting in a method which is overly conservative. The GBH has a similar pattern. It fails to control the FDR when and is overly conservative when . This conservativeness gets more and more prominent as increases. When , the SC method yields slightly more rejections, largely due to its inflated error rate. When , oracle One-Way GATE 1 works way better than oracle SC method and oracle GBH method.
As seen from Figures 6-14, oracle One-Way GATE 1 is seen to retain its improved performance over the oracle versions of Naive, SC and GBH methods for larger values of .
4.2 One-Way GATE 2
Simulation studies were conducted to compare oracle One-Way GATE 2 to its only competitor, the BB method (Benjamini & Bogomolov (2014)) in its oracle form that operates as follows: Oracle BB method using Simes’ combination: is converted to its -value . With denoting the sorted -values in group , let denote Simes’ combination of the -values in group in its oracle form, for . Let be the set of indices of the group specific hypotheses rejected using the oracle level BH method based on , . Reject the hypotheses corresponding to for all and .
The comparison was made in terms of selective FDR, average number of total rejections, and average number of true rejections were carried out under the same setting as in One-Way GATE 1. Figures 4 and 5 present the comparison for the setting where , , and and respectively and . The results for other settings are reported in Figures 15-23. First, it is demonstrated that both the oracle One-Way GATE 2 and oracle BB method control the well.
The oracle One-Way GATE 2 is more powerful in terms of yielding a large number of true rejection when the is relatively small, indicating a high sparsity level between-group level. When is as large as , most of the groups are selected, and there is little adjustment for selection in the oracle BB method. It thus has more number of rejections. When the group size is large (=50), the oracle One-Way GATE 2 is more powerful than the oracle BB method; however, the latter one can lead to larger number of rejections when the group size is small (=5).
5 Concluding Remarks
The primary focus of this article has been to continue the line of research in Liu et al. (2016) to answer Q1 and Q2 for one-way classified hypotheses, providing the ground work for our broader goal of answering these questions in the setting of two-way classified hypotheses. Two-way classified setting is seen to occur in many applications. For instance, in time-course microarray experiment (see, e.g., Storey et al. (2005); Yuan & Kendziorski (2006); Sun & Wei (2011)), the hypotheses of interest can be laid out in a two-way classified form with ‘gene’ and ‘time-point’ representing the two categories of classification. In multiphenotype GWAS (Peterson et al. (2016b); Segura et al. (2012)), the families of the hypotheses related to different phenotypes form one level of grouping, while the other level of grouping is formed by the families of hypotheses corresponding to different SNPs. Two-way classified structure of hypotheses occurs also in brain imaging studies (Liu et al. (2009); Stein et al. (2010); Lin et al. (2014); Barber & Ramdas (2015)). Now that we know the theoretical framework successfully capturing the underlying group effect and yielding powerful approaches to multiple testing in the one-way classified setting, we can proceed to extend it to produce newer and powerful Lfdr based approaches answering Q1 and Q2 in two-way classified setting. We intend to do that in our future communications. Also, we have focused in this paper on developing the GATE algorithms in their oracle forms. In practice, one can estimate the unknown quantities in these oracle methods using various estimation techniques; see, e.g. Liu et al. (2016). Additionally, we can assume hyper-priors for the parameters and use Bayesian tools to calculate the Lfdrs. We will leave these for our future research.
The figures associated with our numerical studies involving the BB method in its oracle form seems to suggest that this method, as proposed in Benjamini & Bogomolov (2014), can potentially be improved by plugging into it an estimated proportion of active groups. This is another important direction that we will pursue in our future research.
These results, although appeared before in Liu et al. (2016), will be proved here using different and simpler arguments. They are re-stated, without any loss of generality, for a single group with slightly different notations in the following lemma.
Conditionally given , let , , be distributed as follows: (i) , and (ii) . Let Lfdr, with , for , and Lfdr. Then,
First, note that
from which we get
When , the conditional distribution of given can be obtained similar to that in (A.4) as follows:
Proof of Theorem 3.1. For notational simplicity, we will hide in ,