Kernel-Based Tests for Likelihood-Free Hypothesis Testing

08/17/2023
by   Patrik Róbert Gerber, et al.
0

Given n observations from two balanced classes, consider the task of labeling an additional m inputs that are known to all belong to one of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions (n=∞) the problem is solved optimally by the likelihood-ratio test; when m=1 it corresponds to binary classification; and when m≈ n it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between m and n: increasing the data sample m reduces the amount n of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes – a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under maximum mean discrepancy (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric m vs n trade-off.

READ FULL TEXT

page 2

page 30

page 36

research
11/02/2022

Likelihood-free hypothesis testing

Consider the problem of testing Z ∼ℙ^⊗ m vs Z ∼ℚ^⊗ m from m samples. Gen...
research
06/19/2023

Minimax optimal testing by classification

This paper considers an ML inspired approach to hypothesis testing known...
research
11/27/2018

The Structure of Optimal Private Tests for Simple Hypotheses

Hypothesis testing plays a central role in statistical inference, and is...
research
11/01/2019

Training Neural Networks for Likelihood/Density Ratio Estimation

Various problems in Engineering and Statistics require the computation o...
research
02/21/2020

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determ...
research
11/06/2020

Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions

Two-sample testing is a fundamental tool for scientific discovery. Yet, ...
research
10/15/2012

The Perturbed Variation

We introduce a new discrepancy score between two distributions that give...

Please sign up or login with your details

Forgot password? Click here to reset