Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

07/03/2020
by   David L. Donoho, et al.
0

Given two samples from possibly different discrete distributions over a common set of size N, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of N^1-β elements are perturbed by r(log N)/2n in the Hellinger distance, where n is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from N exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter β and the perturbation intensity parameter r. Specifically, we derive a region in the (β,r)-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense (N≫ n) and sparse (N≪ n) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

READ FULL TEXT

page 11

page 12

research
03/06/2021

Log-Chisquared P-values under Rare and Weak Departures

Consider a multiple hypothesis testing setting in which only a small pro...
research
02/15/2021

On the Inability of the Higher Criticism to Detect Rare/Weak Departures

Consider a multiple hypothesis testing setting involving rare/weak featu...
research
09/22/2021

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...
research
12/17/2017

Hypothesis Testing for High-Dimensional Multinomials: A Selective Review

The statistical analysis of discrete data has been the subject of extens...
research
12/30/2015

Joint limiting laws for high-dimensional independence tests

Testing independence is of significant interest in many important areas ...
research
09/19/2019

Comparing distributions: ℓ_1 geometry improves kernel two-sample testing

Are two sets of observations drawn from the same distribution? This prob...
research
01/30/2023

Active Sequential Two-Sample Testing

Two-sample testing tests whether the distributions generating two sample...

Please sign up or login with your details

Forgot password? Click here to reset