Sharp Constants in Uniformity Testing via the Huber Statistic

06/21/2022
by   Shivam Gupta, et al.
0

Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on m elements from any ϵ-far distribution with 1-δ probability is n = Θ(√(m log (1/δ))/ϵ^2 + log (1/δ)/ϵ^2), which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to 0 or ∞. We explain this discrepancy by studying the constant factors required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of (1 + o(1))√(m log (1/δ))/ϵ^2 in the regime where this term is dominant, unlike all other existing testers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on t...
research
05/12/2022

Sequential algorithms for testing identity and closeness of distributions

What advantage do sequential procedures provide over batch algorithms fo...
research
07/06/2019

Testing Mixtures of Discrete Distributions

There has been significant study on the sample complexity of testing pro...
research
08/27/2023

Testing Junta Truncation

We consider the basic statistical problem of detecting truncation of the...
research
12/04/2020

Near-Optimal Model Discrimination with Non-Disclosure

Let θ_0,θ_1 ∈ℝ^d be the population risk minimizers associated to some lo...
research
08/02/2022

Bias Reduction for Sum Estimation

In classical statistics and distribution testing, it is often assumed th...
research
06/17/2022

AutoML Two-Sample Test

Two-sample tests are important in statistics and machine learning, both ...

Please sign up or login with your details

Forgot password? Click here to reset