Sparse Uniformity Testing

09/22/2021
by   Bhaswar B. Bhattacharya, et al.
0

In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on n samples, whether a discrete distribution supported on d elements differs from the uniform distribution only in s (out of the d) coordinates and is ε-far (in total variation distance) from uniformity. Our results reveal various interesting phase transitions which depend on the interplay of the sample size n and the signal strength ε with the dimension d and the sparsity level s. For instance, if the sample size is less than a threshold (which depends on d and s), then all tests are asymptotically powerless, irrespective of the magnitude of the signal strength. On the other hand, if the sample size is above the threshold, then the detection boundary undergoes a further phase transition depending on the signal strength. Here, a χ^2-type test attains the detection boundary in the dense regime, whereas in the sparse regime a Bonferroni correction of two maximum-type tests and a version of the Higher Criticism test is optimal up to sharp constants. These results combined provide a complete description of the phase diagram for the sparse uniformity testing problem across all regimes of the parameters n, d, and s. One of the challenges in dealing with multinomials is that the parameters are always constrained to lie in the simplex. This results in the aforementioned two-layered phase transition, a new phenomenon which does not arise in classical high-dimensional sparse testing problems.

READ FULL TEXT
research
07/23/2019

Minimax rates in sparse, high-dimensional changepoint detection

We study the detection of a sparse change in a high-dimensional mean vec...
research
10/28/2019

Testing Equivalence of Clustering

In this paper, we test whether two datasets share a common clustering st...
research
07/03/2020

Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

Given two samples from possibly different discrete distributions over a ...
research
05/30/2023

Robust mean change point testing in high-dimensional data with heavy tails

We study a mean change point testing problem for high-dimensional data, ...
research
10/13/2019

Five Shades of Grey: Phase Transitions in High-dimensional Multiple Testing

We are motivated by marginal screenings of categorical variables, and st...
research
12/01/2022

High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization

Label Shift has been widely believed to be harmful to the generalization...
research
02/24/2021

It was "all" for "nothing": sharp phase transitions for noiseless discrete channels

We establish a phase transition known as the "all-or-nothing" phenomenon...

Please sign up or login with your details

Forgot password? Click here to reset