Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

07/03/2020
by   David L. Donoho, et al.
0

Given two samples from possibly different discrete distributions over a common set of size N, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of N^1-β elements are perturbed by r(log N)/2n in the Hellinger distance, where n is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from N exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter β and the perturbation intensity parameter r. Specifically, we derive a region in the (β,r)-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense (N≫ n) and sparse (N≪ n) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset