Likelihood-free hypothesis testing

by   Patrik Róbert Gerber, et al.

Consider the problem of testing Z ∼ℙ^⊗ m vs Z ∼ℚ^⊗ m from m samples. Generally, to achieve a small error rate it is necessary and sufficient to have m ≍ 1/ϵ^2, where ϵ measures the separation between ℙ and ℚ in total variation (𝖳𝖵). Achieving this, however, requires complete knowledge of the distributions ℙ and ℚ and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to ℙ and ℚ (which are a priori only known to belong to a large non-parametric family 𝒫) is given through n iid samples from each. We demostrate existence of a fundamental trade-off between n and m given by nm ≍ n^2_𝖦𝗈𝖥(ϵ,𝒫), where n_𝖦𝗈𝖥 is the minimax sample complexity of testing between the hypotheses H_0: ℙ= ℚ vs H_1: 𝖳𝖵(ℙ,ℚ) ≥ϵ. We show this for three non-parametric families P: β-smooth densities over [0,1]^d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions 𝒫 on a large alphabet [k] with pmfs bounded by c/k for fixed c. The test that we propose (based on the L^2-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when m≫ 1/ϵ^2 our test requires the number of simulation samples n to be orders of magnitude smaller than what is needed for density estimation with accuracy ≍ϵ (under 𝖳𝖵). This demonstrates the possibility of testing without fully estimating the distributions.


Minimax optimal testing by classification

This paper considers an ML inspired approach to hypothesis testing known...

Kernel-Based Tests for Likelihood-Free Hypothesis Testing

Given n observations from two balanced classes, consider the task of lab...

Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions

Two-sample testing is a fundamental tool for scientific discovery. Yet, ...

Robust hypothesis testing and distribution estimation in Hellinger distance

We propose a simple robust hypothesis test that has the same sample comp...

Information in additional observations of a non-parametric experiment that is not estimable

Given n independent and identically distributed observations and measuri...

Independence Testing for Bounded Degree Bayesian Network

We study the following independence testing problem: given access to sam...

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Sequential hypothesis testing is a desirable decision making strategy in...

Please sign up or login with your details

Forgot password? Click here to reset