Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

07/03/2020
by   David L. Donoho, et al.
0

Given two samples from possibly different discrete distributions over a common set of size N, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of N^1-β elements are perturbed by r(log N)/2n in the Hellinger distance, where n is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from N exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter β and the perturbation intensity parameter r. Specifically, we derive a region in the (β,r)-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense (N≫ n) and sparse (N≪ n) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

READ FULL TEXT

Authors

page 11

page 12

03/06/2021

Log-Chisquared P-values under Rare and Weak Departures

Consider a multiple hypothesis testing setting in which only a small pro...
02/15/2021

On the Inability of the Higher Criticism to Detect Rare/Weak Departures

Consider a multiple hypothesis testing setting involving rare/weak featu...
09/22/2021

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...
07/23/2019

Minimax rates in sparse, high-dimensional changepoint detection

We study the detection of a sparse change in a high-dimensional mean vec...
12/17/2017

Hypothesis Testing for High-Dimensional Multinomials: A Selective Review

The statistical analysis of discrete data has been the subject of extens...
12/30/2015

Joint limiting laws for high-dimensional independence tests

Testing independence is of significant interest in many important areas ...
11/06/2020

Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions

Two-sample testing is a fundamental tool for scientific discovery. Yet, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.