Five Shades of Grey: Phase Transitions in High-dimensional Multiple Testing

by   Zheng Gao, et al.

We are motivated by marginal screenings of categorical variables, and study high-dimensional multiple testing problems where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discovery (in terms of family-wise error rate or false discovery rate) and missed detection (in terms of family-wise non-discovery rate or false non-discovery rate) in large dimensions. Remarkably, degrees of freedom in the chi-square distributions do not affect the boundaries in all four phase transitions. Several well-known procedures are shown to attain these boundaries. Two new phase transitions are also identified in the Gaussian location model under one-sided alternatives. We then elucidate on the nature of signal sizes in association tests by characterizing its relationship with marginal frequencies, odds ratio, and sample sizes in 2×2 contingency tables. This allows us to illustrate an interesting manifestation of the phase transition phenomena in genome-wide association studies (GWAS). We also show, perhaps surprisingly, that given total sample sizes, balanced designs in such association studies rarely deliver optimal power.



page 5

page 14

page 17

page 20

page 21


Optimal Procedures for Multiple Testing Problems

Multiple testing problems are a staple of modern statistical analysis. T...

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression mode...

Covariate Adaptive Family-wise Error Rate Control for Genome-Wide Association Studies

The family-wise error rate (FWER) has been widely used in genome-wide as...

Optimal False Discovery Rate Control for Large Scale Multiple Testing with Auxiliary Information

Large-scale multiple testing is a fundamental problem in high dimensiona...

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...

Online multiple testing with super-uniformity reward

Valid online inference is an important problem in contemporary multiple ...

Generalized Goodness-Of-Fit Tests for Correlated Data

This paper concerns the problem of applying the generalized goodness-of-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.