MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

06/14/2023
by   Felix Biggs, et al.
1

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.

READ FULL TEXT

page 25

page 26

research
03/30/2020

Minimax optimality of permutation tests

Permutation tests are widely used in statistics, providing a finite-samp...
research
06/03/2020

Learning Kernel Tests Without Data Splitting

Modern large-scale kernel-based tests such as maximum mean discrepancy (...
research
10/07/2021

A Fast and Effective Large-Scale Two-Sample Test Based on Kernels

Kernel two-sample tests have been widely used and the development of eff...
research
10/28/2021

MMD Aggregated Two-Sample Test

We propose a novel nonparametric two-sample test based on the Maximum Me...
research
02/21/2020

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determ...
research
10/28/2021

Kernel-based Partial Permutation Test for Detecting Heterogeneous Functional Relationship

We propose a kernel-based partial permutation test for checking the equa...
research
09/19/2019

Comparing distributions: ℓ_1 geometry improves kernel two-sample testing

Are two sets of observations drawn from the same distribution? This prob...

Please sign up or login with your details

Forgot password? Click here to reset