The minimax risk in testing the histogram of discrete distributions for uniformity under missing ball alternatives

by   Alon Kipnis, et al.

We consider the problem of testing the fit of a discrete sample of items from many categories to the uniform distribution over the categories. As a class of alternative hypotheses, we consider the removal of an ℓ_p ball of radius ϵ around the uniform rate sequence for p ≤ 2. We deliver a sharp characterization of the asymptotic minimax risk when ϵ→ 0 as the number of samples and number of dimensions go to infinity, for testing based on the occurrences' histogram (number of absent categories, singletons, collisions, ...). For example, for p=1 and in the limit of a small expected number of samples n compared to the number of categories N (aka "sub-linear" regime), the minimax risk R^*_ϵ asymptotes to 2 Φ̅(n ϵ^2/√(8N)), with Φ̅(x) the normal survival function. Empirical studies over a range of problem parameters show that this estimate is accurate in finite samples, and that our test is significantly better than the chisquared test or a test that only uses collisions. Our analysis is based on the asymptotic normality of histogram ordinates, the equivalence between the minimax setting to a Bayesian one, and the reduction of a multi-dimensional optimization problem to a one-dimensional problem.


On Minimax Exponents of Sparse Testing

We consider exact asymptotics of the minimax risk for global testing aga...

Local minimax rates for closeness testing of discrete distributions

We consider the closeness testing (or two-sample testing) problem in the...

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates

We consider the goodness-of-fit testing problem of distinguishing whethe...

Optimal locally private estimation under ℓ_p loss for 1< p< 2

We consider the minimax estimation problem of a discrete distribution wi...

Minimax Optimal Additive Functional Estimation with Discrete Distribution

This paper addresses a problem of estimating an additive functional give...

Online Estimation and Optimization of Utility-Based Shortfall Risk

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingl...

Statistical and Computational Guarantees for the Baum-Welch Algorithm

The Hidden Markov Model (HMM) is one of the mainstays of statistical mod...

Please sign up or login with your details

Forgot password? Click here to reset