Properties of adaptively weighted Fisher's method

08/01/2019 ∙ by Yusi Fang, et al. ∙ Roche University of Pittsburgh 0

Meta-analysis is a statistical method to combine results from multiple clinical or genomic studies with the same or similar research problems. It has been widely use to increase statistical power in finding clinical or genomic differences among different groups. One major category of meta-analysis is combining p-values from independent studies and the Fisher's method is one of the most commonly used statistical methods. However, due to heterogeneity of studies, particularly in the field of genomic research, with thousands of features such as genes to assess, researches often desire to discover this heterogeneous information when identify differentially expressed genomic features. To address this problem, Li2011adaptively proposed very interesting statistical method, adaptively weighted (AW) Fisher's method, where binary weights, 0 or 1, are assigned to the studies for each feature to distinguish potential zero or none-zero effect sizes. Li2011adaptively has shown some good properties of AW fisher's method such as the admissibility. In this paper, we further explore some asymptotic properties of AW-Fisher's method including consistency of the adaptive weights and the asymptotic Bahadur optimality of the test.



There are no comments yet.


page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Meta-analysis is one of the most commonly used statistical method to synthesize information from multiple studies, particularly when each single study does not have enough power to draw a meaningful conclusion due to weak signals or small effective sizes. There are two commonly used methods to combine results from different studies: (1) directly combining the effect sizes and (2) indirectly combining p-values from independent studies. Because of heterogeneous nature of alternative distributions, slightly different study goals and types of data, in omics studies, combining p-values is often more appropriate and appealing. Commonly used p-value combing methods include the Fisher’s method (Fisher, 1925), the Stouffer’s method (Stouffer et al., 1949)

, the logit method

(Lancaster, 1961), and minimum p-value (min-P) and maximum p-value (max-P) methods (Tippett and others, 1931; Wilkinson, 1951).

In addition, in omics studies, researchers are often more interested in identifying biomarkers that are differentially expressed (DE) with consistent patterns across multiple studies. However, most p-value combining methods such as Fisher’s method are mainly targeting on the gain of statistical power without providing any further information about the heterogeneities of the expression patterns for detected biomarkers. This problem was first gained attention in functional magnetic resonance imaging (fMRI) research (Friston et al., 2005), and many methods have been proposed to address this heterogeneity problem since then. For example, Song and Tseng (2014) proposed ordered p-value (rOP) method to test the alternative hypothesis in which signals exist in at least a given percentage of studies; Li and Ghosh (2014) proposed a class of meta-analysis methods based on summaries of weighted ordered p-values (WOP); and Li et al. (2011) proposed an adaptively weighted (AW) Fisher’s method for gene expression data, in which a binary weight, or , is assigned to each study in order to distinguish the potential of existing group effects. Similar ideas such as AW-FEM and AW-Bayesian approach were applied to GWAS meta-analysis (Han and Eskin, 2012; Bhattacharjee et al., 2012), where only the effect sizes in a subset of studies were assumed to be non-zero in alternative hypotheses (Flutre et al., 2013).

The AW-Fisher’s method has appealing feature in practice. This is because additional information can be obtained though estimated adaptive weights for detected DE genes. The adaptive weight estimates

reflect a natural biological interpretation of whether or not a study contributes to the statistical significance of a gene on differentiating groups and provide a way for gene categorization in follow-up biological interpretations and explorations.

Here is a motivative example of the AW-Fisher’s method. Figure 1 shows the heatmaps of gene expressions for DE genes identified by Fisher’s and AW-Fisher’s methods for three tissue mouse datasets. Fisher’s method does not provide any indication of contribution of studies to the statistical significance, while the adaptive weights of AW-Fisher’s method can group together the genes that share the same gene expression pattern, therefore providing information of gene-specific heterogeneity. This information could be very appealing in genomic data analysis and potentially very useful to interpret biological mechanisms.

Figure 1: Heatmaps of gene expressions for DE genes identified by Fisher’s and AW-Fisher’s methods in the mouse energy metabolism datasets. (A) shows heatmap of gene expressions for DE genes identified by Fisher’s method with false discovery rate 1% (555 genes). For each of the three tissues (Brown fat, Liver and Heart), the group labels are on top of the heatmap, with black color represents wild type (WT) and gray color represents Very long-chain acyl-CoA dehydrogenase (VLCAD) deficiency. For the heatmap, darker color represent lower expression level and whiter color represent higher expression level. (B) shows heatmap of gene expressions for DE genes identified by AW-Fisher’s methods with false discovery rate 1% (501 genes), where only concordant genes are shown in the heatmap. The AW weight categories are shown to the left of the heatmap. (C) shows two specific genes from the mouse energy metabolism datasets. Gene probe 1419130_at belongs to category of AW-Fisher’s methods. Gene probe 1422010_at belongs to category of AW-Fisher’s methods.

In this paper, we will further explore some asymptotic properties such as consistency of the adaptive weights and asymptotic Bahadur optimality (ABO) of the test (Bahadur, 1967).

This paper is organized as follows. In Section 2, method will be briefly reviewed and in Section 3, consistency of AW-Fisher weights is addressed. Asymptotic Barhadur optimality will be discussed in Section 4. In section 5, simulations are used to show the consistency and exact slopes. The paper ends with discussion in Chapter 6.

2 Adaptively weighted Fisher’s method

Considering to combine independent studies, denote the effect size by and let the corresponding p-value of study be for . The null and alternative hypothesis settings considered in this paper are

Fisher’s method summarizes p-values using the statistic to test this hypothesis setting. It has been shown that follows a distribution with degrees of freedom if data from different studies are independent and there are no underlying difference from all studies ( for all ). However, in genomic studies, usually tens of thousand of features are considered. The heterogeneous expression patterns are of interests. Fisher’s method does not provide any information about the potential different expression patterns from different features. Li et al. (2011)

proposed an adaptively weighted Fisher’s method to reveal this information through assigning binary weights. Let vector

, where is the AW weight associated with studies and is the random vector of p-value vector for K studies. Under the null distribution and conditional on , the significance level using Fisher’s method is , where and

is the cumulative distribution function (CDF) of

-distribution with degrees of freedom

. The test statistic of AW-Fisher based on p-value vector

is defined as

where optimal weight is determined by


Here is the mapping function from p-value vector to the AW-Fisher test statistic. Let , then equation (1) implies that the best adaptive weights can be obtained by comparing all none-zero combinations of weights . In Li et al. (2011), a permutation algorithm was proposed to calculate the p-value , where the observed AW-statistic and

is the observed p-values. In Huo et al. (2017), an importance sampling technique with spline interpolation and a linear weight search scheme is proposed to overcome computational burden and lack of accuracy for small p-values of the permutation algorithm.

3 Consistency of the weight estimates

Before we prove the consistency of AW-Fisher’s weight estimates and the asymptotic Bahadur optimality of the AW-Fisher’s method, we review exact slope and present some assumptions.

For independent studies to test with sample size and p-value for , the statistical test for study has exact slope , if

The exact slope is non-negative and used to measure how fast the p-value converge to zero as goes to infinity. If the test statistics comes from alternative hypothesis, is positive while under the null .

When we consider the consistency, we assume the proportion of total samples assigned to each study are asymptotically fixed. I.e.

where , the averaged sample size. Therefore

In addition, denote quantile of by , i.e., . It can be seen that as .

Next we prove the main theorem of the consistency.

Let , the true weight vector. as where statisfies Equation (1), i.e. asymptotically all and only the studies with non-zero effect sizes will contribute to the AW statistic.



for as

Assume that studies have weight asymptotically. Without loss of generality, let first studies have weight .

First, we prove that there is no study with with weight . Suppose there exist such studies, say such that . Denote and represents according to Equation 1. Since

we have


i.e. the studies will eventually get a weight once . The convergence rate is .

Notice that here we only require that and to let the above argument hold.

Second, if there exist studies with zero effect size that have a weight . Without loss of generality, let . In order to have weight for these studies, one must have

Then we have

Therefore, eventually no studies with zero effect size will have weight with convergence rate of . From these two arguments we can see that only those with non-zero effect size will eventually be assigned to weight 1. Note that the convergence rate for study with non-zero effective size to be weight 1 is faster than that for study with zero effective size to be eventually assigned to 0.

Let be a selected subset of with weights 1, while outside subset the weights are zero. Within the subset, assume have non-zero effect size, while the remainder have zero effect size. Further more, assume outside the subset with true non-zero effect size. Denote as the subset with the true non-zero effect size.

Based on the previous two results proved above, for ,then we have

as n goes to infinity.

For , Then we have

as n goes to infinity. ∎

4 The asymptotic Bahadur optimality of AW-Fisher’s method

Bahadur (1967) first discussed the asymptotic optimality of statistical test under the conditions of exact slopes and proportional sample sizes as shown in the previous section, which is called asymptotic Bahadur optimality (ABO) by using the ratio of the exact slopes of different statistical tests and the test with larger exact slope is viewed as superior. Littell and Folks (1971) showed that the Fisher’s method is ABO. In this paper we will use the Bahadur relative efficiency as our primary measure of comparing p-value combination methods. Assuming two statistical tests are formed to test the same hypothesis and have exact slopes and respectively, then the ratio is the exact Bahadur efficiency of test relative to test , and implies that test 1 is asymptotically more efficient than test 2. For ,

with probability one, i.e., if the effect size is

, the exact slope of the statistical test is .


For , since the p-value is distributed uniformly in . Since is tight and with probability one, then we have i.e., . ∎

Let , where

follows chi-square distribution with degree of freedom k. Then



The proof is given by Bahadur and others (1960) in page 283.

Littell and Folks (1971) showed that given independent studies with p-values, sample sizes and exact slopes respectively, the exact slope of Fisher’s method is . Let be exact slope from AW-Fisher’s method. Since the Fisher’s method is ABO, i.e., is the largest among all p-value combination procedures (Littell and Folks, 1973), under the assumption , to prove the AW-Fisher’s method is also ABO, here we show that the exact slopes from AW-Fisher and Fisher’s methods are the same.

Under the conditions about exact slopes and proportion of sample sizes, we have , i.e., the AW-Fisher’s method is ABO.


Let be a subset of with size for , denote ,then for a given test statistic ,

as goes to infinity.
Since the adaptive weights are consistent,and by Lemma 1 and 2, we have


On the other hand, since

by utilizing the consistency of the adaptive weights and Lemma 3 and 4 again, we have

Therefore . Other the other hand, . Therefore, and so AW-Fisher’s method is also ABO. ∎

5 Simulations

In this simulation, we use numerical approach to evaluate and verify the convergence rate of the weight estimates. We consider studies with the same sample size

for all studies. The data are generated from standard normal distribution with

from control and

from treatment groups. We first generate independent identical random variable

if study has no treatment effect and if study with treatment effect. Let , then ,

under the null hypothesis and

under the alternative hypothesis, where . Therefore, p-value for study can be calculated as


where is the cumulative density function of standard normal distribution.

In the first simulation, we estimate the convergence rate for studies with non-zero effective size, i.e. . We have shown in Section 3 that

We evaluate this probability based on 1 million simulations for each from 200 to 1000, in which all 4 studies are considered with effect sizes for studies 1 to 4.

Given sample size , the AW weight estimate can be obtained according to equation (1). We generated scattered plot of with respect to sample size and it is shown in Figure 2(a). We further fitted a curve with functional form , where and are the parameters estimated from the data. The estimates are and . The fitted curve agrees with the functional form very well.

In the second simulation, we set the one study (study 4) with effective size 0 and all other studies with effect size 0.4 to estimate . As shown in Section 3, the converge rate for study with zero effect size is

We again use 1 million simulations to estimate this probability for for study 4. And the AW weight estimate can be obtained according to equation (1). The scattered plot of against sample size is shown in Figure 2(b). We further fitted a curve with functional form , where and are the parameters estimated from the data and the fitted curve agrees with the simulated data very well.

Figure 2: Comparing accuracy of the new approach and permutation approach to obtain the AW p-values. The scattered plots are p-values from the two methods against the closed-form solution.

6 Conclusion

The AW-Fisher’s method proposed in Li et al. (2011) has shown to have many good properties such as admissibility and better overall power compared to min-P, max-P and Fisher’s methods in various situations. More importantly, the adaptive weights can provide additional information about heterogeneity of effect sizes in the different studies, a feature particularly appealing in the genomic meta-analysis.

For the practical usage of adaptive weights, such as uncertainty of adaptive weight estimates, has been discussed in Huo et. al (2017). A fast algorithm to estimate accurate p-values based on importance sampling also proposed in Huo et. al (2017).

In this paper, we further studied the asymptotic properties of AW-Fisher’s method. We have shown the consistency of adaptive weights and asymptotic Bahadur optimality to reaffirm the validity and value of AW-Fisher’s method. The asymptotic convergence rate of AW weight has been verified using simulation.


  • R. R. Bahadur (1967) Rates of convergence of estimates and test statistics. Annals of Mathematical Statistics 38 (2), pp. 303–324. Cited by: §1, §4.
  • R. R. Bahadur et al. (1960) Stochastic comparison of tests. Annals of Mathematical Statistics 31 (2), pp. 276–295. Cited by: §4.
  • S. Bhattacharjee, P. Rajaraman, K. B. Jacobs, W. A. Wheeler, B. S. Melin, P. Hartge, M. Yeager, C. C. Chung, S. J. Chanock, N. Chatterjee, et al. (2012) A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. The American Journal of Human Genetics 90 (5), pp. 821–835. Cited by: §1.
  • R. A. Fisher (1925) Statistical methods for research workers. Genesis Publishing Pvt Ltd. Cited by: §1.
  • T. Flutre, X. Wen, J. Pritchard, and M. Stephens (2013) A statistical framework for joint eqtl analysis in multiple tissues. PLoS Genet 9 (5), pp. e1003486. Cited by: §1.
  • K. J. Friston, W. D. Penny, and D. E. Glaser (2005) Conjunction revisited. Neuroimage 25 (3), pp. 661–667. Cited by: §1.
  • B. Han and E. Eskin (2012) Interpreting meta-analyses of genome-wide association studies. PLoS Genet 8 (3), pp. e1002555. Cited by: §1.
  • H. Lancaster (1961) The combination of probabilities: an application of orthonormal functions. Australian Journal of Statistics 3 (1), pp. 20–33. Cited by: §1.
  • J. Li, G. C. Tseng, et al. (2011) An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. The Annals of Applied Statistics 5 (2A), pp. 994–1019. Cited by: §1, §2, §6, [.
  • Y. Li and D. Ghosh (2014) Meta-analysis based on weighted ordered p-values for genomic data with heterogeneity. BMC bioinformatics 15 (1), pp. 226. Cited by: §1.
  • R. C. Littell and J. L. Folks (1971) Asymptotic optimality of fisher’s method of combining independent tests. Journal of the American Statistical Association 66 (336), pp. 802–806. Cited by: §4, §4.
  • R. C. Littell and J. L. Folks (1973) Asymptotic optimality of fisher’s method of combining independent tests. Journal of the American Statistical Association 68 (341), pp. 193–194. Cited by: §4.
  • C. Song and G. C. Tseng (2014) Hypothesis setting and order statistic for robust genomic meta-analysis. The annals of applied statistics 8 (2), pp. 777. Cited by: §1.
  • S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star, and R. M. Williams Jr (1949) The american soldier, Vol. I: adjustment during army life.(studies in social psychology in World War II). Princeton Univiversity Press. Cited by: §1.
  • L. H. C. Tippett et al. (1931) The methods of statistics.. The methods of statistics.. Cited by: §1.
  • B. Wilkinson (1951) A statistical consideration in psychological research.. Psychological Bulletin 48 (2), pp. 156. Cited by: §1.