Comparing distributions: ℓ_1 geometry improves kernel two-sample testing

09/19/2019
by   M. Scetbon, et al.
0

Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the L^2 distance between kernel-based distribution representatives to derive their test statistics. Here, we show that L^p distances (with p≥ 1) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence. Moreover, for analytic kernels, we show that the L^1 geometry gives improved testing power for scalable computational procedures. Specifically, we derive a finite dimensional approximation of the metric given as the ℓ_1 norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to maximize the differences of the distributions and give interpretable indications of how they differs. Using an ℓ_1 norm gives better detection because differences between representatives are dense as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while much faster than state-of-the-art quadratic-time kernel-based tests. Experiments on artificial and real-world problems demonstrate improved power/time tradeoff than the state of the art, based on ℓ_2 norms, and in some cases, better outright power than even the most expensive quadratic-time tests.

READ FULL TEXT

page 30

page 32

research
06/15/2015

Fast Two-Sample Testing with Analytic Representations of Probability Measures

We propose a class of nonparametric two-sample tests with a cost linear ...
research
05/22/2016

Interpretable Distribution Features with Maximum Testing Power

Two semimetrics on probability distributions are proposed, given as the ...
research
02/21/2020

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determ...
research
01/14/2023

Compress Then Test: Powerful Kernel Testing in Near-linear Time

Kernel two-sample testing provides a powerful framework for distinguishi...
research
10/15/2016

An Adaptive Test of Independence with Analytic Kernel Embeddings

A new computationally efficient dependence measure, and an adaptive stat...
research
06/14/2023

MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

We propose novel statistics which maximise the power of a two-sample tes...
research
07/03/2020

Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

Given two samples from possibly different discrete distributions over a ...

Please sign up or login with your details

Forgot password? Click here to reset