On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

11/23/2014
by   Aaditya Ramdas, et al.
0

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (general alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (mean-shift alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test's power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternatives perform when faced with easier ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2015

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

Nonparametric two sample testing is a decision theoretic problem that in...
research
08/07/2023

Partial identification of kernel based two sample tests with mismeasured data

Nonparametric two-sample tests such as the Maximum Mean Discrepancy (MMD...
research
10/02/2022

A Kernel Measure of Dissimilarity between M Distributions

Given M ≥ 2 distributions defined on a general measurable space, we intr...
research
05/22/2017

A Linear-Time Kernel Goodness-of-Fit Test

We propose a novel adaptive test of goodness-of-fit, with computational ...
research
09/30/2021

Two Sample Testing in High Dimension via Maximum Mean Discrepancy

Maximum Mean Discrepancy (MMD) has been widely used in the areas of mach...
research
10/28/2019

Testing Equivalence of Clustering

In this paper, we test whether two datasets share a common clustering st...
research
03/22/2017

Testing and Learning on Distributions with Symmetric Noise Invariance

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD...

Please sign up or login with your details

Forgot password? Click here to reset