Five Shades of Grey: Phase Transitions in High-dimensional Multiple Testing

10/13/2019
by   Zheng Gao, et al.
0

We are motivated by marginal screenings of categorical variables, and study high-dimensional multiple testing problems where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discovery (in terms of family-wise error rate or false discovery rate) and missed detection (in terms of family-wise non-discovery rate or false non-discovery rate) in large dimensions. Remarkably, degrees of freedom in the chi-square distributions do not affect the boundaries in all four phase transitions. Several well-known procedures are shown to attain these boundaries. Two new phase transitions are also identified in the Gaussian location model under one-sided alternatives. We then elucidate on the nature of signal sizes in association tests by characterizing its relationship with marginal frequencies, odds ratio, and sample sizes in 2×2 contingency tables. This allows us to illustrate an interesting manifestation of the phase transition phenomena in genome-wide association studies (GWAS). We also show, perhaps surprisingly, that given total sample sizes, balanced designs in such association studies rarely deliver optimal power.

READ FULL TEXT

Authors

page 5

page 14

page 17

page 20

page 21

04/26/2018

Optimal Procedures for Multiple Testing Problems

Multiple testing problems are a staple of modern statistical analysis. T...
04/02/2020

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression mode...
11/02/2020

Covariate Adaptive Family-wise Error Rate Control for Genome-Wide Association Studies

The family-wise error rate (FWER) has been widely used in genome-wide as...
03/29/2021

Optimal False Discovery Rate Control for Large Scale Multiple Testing with Auxiliary Information

Large-scale multiple testing is a fundamental problem in high dimensiona...
09/22/2021

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...
10/04/2021

Online multiple testing with super-uniformity reward

Valid online inference is an important problem in contemporary multiple ...
06/10/2018

Generalized Goodness-Of-Fit Tests for Correlated Data

This paper concerns the problem of applying the generalized goodness-of-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.