Testing Closeness With Unequal Sized Samples

04/17/2015
by   Bhaswar B. Bhattacharya, et al.
0

We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of unequal sized samples drawn from each of them. Specifically, given a target error parameter ε > 0, m_1 independent draws from an unknown distribution p, and m_2 draws from an unknown distribution q, we describe a test for distinguishing the case that p=q from the case that ||p-q||_1 ≥ε. If p and q are supported on at most n elements, then our test is successful with high probability provided m_1≥ n^2/3/ε^4/3 and m_2 = Ω({n/√(m)_1ε^2, √(n)/ε^2}); we show that this tradeoff is optimal throughout this range, to constant factors. These results extend the recent work of Chan et al. who established the sample complexity when the two samples have equal sizes, and tightens the results of Acharya et al. by polynomials factors in both n and ε. As a consequence, we obtain an algorithm for estimating the mixing time of a Markov chain on n states up to a n factor that uses Õ(n^3/2τ_mix) queries to a "next node" oracle, improving upon the Õ(n^5/3τ_mix) query algorithm of Batu et al. Finally, we note that the core of our testing algorithm is a relatively simple statistic that seems to perform well in practice, both on synthetic data and on natural language data.

READ FULL TEXT
research
07/06/2022

Comments on "Testing Conditional Independence of Discrete Distributions"

In this short note, we identify and address an error in the proof of The...
research
09/14/2020

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on t...
research
11/01/2017

Active Tolerant Testing

In this work, we give the first algorithms for tolerant testing of nontr...
research
11/23/2022

Perfect Sampling from Pairwise Comparisons

In this work, we study how to efficiently obtain perfect samples from a ...
research
12/02/2019

Improved Algorithm for Tolerant Junta Testing

In this paper, we consider the problem of tolerant junta testing for boo...
research
08/16/2017

Rapid Mixing of k-Class Biased Permutations

In this paper, we study a biased version of the nearest-neighbor transpo...
research
06/20/2019

Extensions of Self-Improving Sorters

Ailon et al. (SICOMP 2011) proposed a self-improving sorter that tunes i...

Please sign up or login with your details

Forgot password? Click here to reset