The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

06/14/2018
by   Cencheng Shen, et al.
0

Distance-based methods, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel methods, developed from "kernel mean embeddings", are leading methods for two-sample and independence tests from the machine learning community. Previous works demonstrated the equivalence of distance and kernel methods only at the population level, for each kind of test, requiring an embedding theory of kernels. We propose a simple, bijective transformation between semimetrics and nondegenerate kernels. We prove that for finite samples, two-sample tests are special cases of independence tests, and the distance-based statistic is equivalent to the kernel-based statistic, including the biased, unbiased, and normalized versions. In other words, upon setting the kernel or metric to be bijective of each other, running any of the four algorithms will yield the exact same answer up to numerical precision. This deepens and unifies our understanding of interpoint comparison based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2019

The Exact Equivalence of Independence Testing and Two-Sample Testing

Testing independence and testing equality of distributions are two tight...
research
02/11/2014

Equivalence of Kernel Machine Regression and Kernel Distance Covariance for Multidimensional Trait Association Studies

Associating genetic markers with a multidimensional phenotype is an impo...
research
05/02/2012

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

We provide a unifying framework linking two classes of statistics used i...
research
07/25/2012

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

We provide a unifying framework linking two classes of statistics used i...
research
12/09/2019

Energy distance and kernel mean embeddings for two-sample survival testing

We study the comparison problem of distribution equality between two ran...
research
01/03/2019

Energy distance and kernel mean embedding for two sample survival test

In this article a new family of tests is proposed for the comparison pro...
research
02/21/2020

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determ...

Please sign up or login with your details

Forgot password? Click here to reset