Likelihood-free hypothesis testing

11/02/2022
by   Patrik Róbert Gerber, et al.
0

Consider the problem of testing Z ∼ℙ^⊗ m vs Z ∼ℚ^⊗ m from m samples. Generally, to achieve a small error rate it is necessary and sufficient to have m ≍ 1/ϵ^2, where ϵ measures the separation between ℙ and ℚ in total variation (𝖳𝖵). Achieving this, however, requires complete knowledge of the distributions ℙ and ℚ and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to ℙ and ℚ (which are a priori only known to belong to a large non-parametric family 𝒫) is given through n iid samples from each. We demostrate existence of a fundamental trade-off between n and m given by nm ≍ n^2_𝖦𝗈𝖥(ϵ,𝒫), where n_𝖦𝗈𝖥 is the minimax sample complexity of testing between the hypotheses H_0: ℙ= ℚ vs H_1: 𝖳𝖵(ℙ,ℚ) ≥ϵ. We show this for three non-parametric families P: β-smooth densities over [0,1]^d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions 𝒫 on a large alphabet [k] with pmfs bounded by c/k for fixed c. The test that we propose (based on the L^2-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when m≫ 1/ϵ^2 our test requires the number of simulation samples n to be orders of magnitude smaller than what is needed for density estimation with accuracy ≍ϵ (under 𝖳𝖵). This demonstrates the possibility of testing without fully estimating the distributions.

READ FULL TEXT
research
06/19/2023

Minimax optimal testing by classification

This paper considers an ML inspired approach to hypothesis testing known...
research
08/17/2023

Kernel-Based Tests for Likelihood-Free Hypothesis Testing

Given n observations from two balanced classes, consider the task of lab...
research
11/06/2020

Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions

Two-sample testing is a fundamental tool for scientific discovery. Yet, ...
research
11/03/2020

Robust hypothesis testing and distribution estimation in Hellinger distance

We propose a simple robust hypothesis test that has the same sample comp...
research
07/19/2020

Information in additional observations of a non-parametric experiment that is not estimable

Given n independent and identically distributed observations and measuri...
research
04/19/2022

Independence Testing for Bounded Degree Bayesian Network

We study the following independence testing problem: given access to sam...
research
08/31/2015

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Sequential hypothesis testing is a desirable decision making strategy in...

Please sign up or login with your details

Forgot password? Click here to reset