Compress Then Test: Powerful Kernel Testing in Near-linear Time

01/14/2023
by   Carles Domingo Enrich, et al.
0

Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on n sample points. However, existing kernel tests either run in n^2 time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each n point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test – recovering the same optimal detection boundary – while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20–200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.

READ FULL TEXT
research
06/15/2015

Fast Two-Sample Testing with Analytic Representations of Probability Measures

We propose a class of nonparametric two-sample tests with a cost linear ...
research
06/14/2021

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Modern kernel-based two-sample tests have shown great success in disting...
research
09/19/2019

Comparing distributions: ℓ_1 geometry improves kernel two-sample testing

Are two sets of observations drawn from the same distribution? This prob...
research
06/20/2018

Random Feature Stein Discrepancies

Computable Stein discrepancies have been deployed for a variety of appli...
research
10/02/2022

A Kernel Measure of Dissimilarity between M Distributions

Given M ≥ 2 distributions defined on a general measurable space, we intr...
research
06/07/2022

A novel statistical approach for two-sample testing based on the overlap coefficient

Here we propose a new nonparametric framework for two-sample testing, na...
research
10/12/2017

New efficient algorithms for multiple change-point detection with kernels

Several statistical approaches based on reproducing kernels have been pr...

Please sign up or login with your details

Forgot password? Click here to reset