Testing Data Binnings

04/27/2020
by   Clément L. Canonne, et al.
0

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) 𝐪 and samples from an unknown distribution 𝐩, both over [n]={1,2,…,n}, whether 𝐩 equals 𝐪, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution 𝐪 is over k ≪ n elements: the question is then whether there exists a suitable binning of the domain [n] into k intervals such that, once "binned," 𝐩 is equal to 𝐪. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2017

Generalized Uniformity Testing

In this work, we revisit the problem of uniformity testing of discrete p...
research
10/28/2017

Wasserstein Identity Testing

Uniformity testing and the more general identity testing are well studie...
research
05/05/2021

Identity testing under label mismatch

Testing whether the observed data conforms to a purported model (probabi...
research
04/10/2018

Testing Identity of Multidimensional Histograms

We investigate the problem of identity testing for multidimensional hist...
research
04/03/2023

Distribution Testing Under the Parity Trace

Distribution testing is a fundamental statistical task with many applica...
research
06/11/2019

Communication and Memory Efficient Testing of Discrete Distributions

We study distribution testing with communication and memory constraints ...
research
01/31/2019

Minimax Testing of Identity to a Reference Ergodic Markov Chain

We exhibit an efficient procedure for testing, based on a single long st...

Please sign up or login with your details

Forgot password? Click here to reset