Information Loss and Power Distortion from Standardizing in Multiple Hypothesis Testing

by   Luella Fu, et al.

Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, there can be a significant loss in information from basing hypothesis tests on standardized statistics rather than the full data. We develop a new class of heteroscedasticity–adjusted ranking and thresholding (HART) rules that aim to improve existing methods by simultaneously exploiting commonalities and adjusting heterogeneities among the study units. The main idea of HART is to bypass standardization by directly incorporating both the summary statistic and its variance into the testing procedure. A key message is that the variance structure of the alternative distribution, which is subsumed under standardized statistics, is highly informative and can be exploited to achieve higher power. The proposed HART procedure is shown to be asymptotically valid and optimal for false discovery rate (FDR) control. Our simulation demonstrates that HART achieves substantial power gain over existing methods at the same FDR level. We illustrate the implementation through a microarray analysis of myeloma.



There are no comments yet.


page 1

page 2

page 3

page 4


False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

We develop a new class of distribution–free multiple testing rules for f...

Methods for Large-scale Single Mediator Hypothesis Testing: Possible Choices and Comparisons

Mediation hypothesis testing for a large number of mediators is challeng...

Smoothed Nested Testing on Directed Acyclic Graphs

We consider the problem of multiple hypothesis testing when there is a l...

Knockoffs for the mass: new feature importance statistics with false discovery guarantees

An important problem in machine learning and statistics is to identify f...

Conditional calibration for false discovery rate control under dependence

We introduce a new class of methods for finite-sample false discovery ra...

Power and Level Robustness of A Composite Hypothesis Testing under Independent Non-Homogeneous Data

Robust tests of general composite hypothesis under non-identically distr...

NAPA: Neighborhood-Assisted and Posterior-Adjusted Two-sample Inference

Two-sample multiple testing problems of sparse spatial data are frequent...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.