Equivalence of distance-based and RKHS-based statistics in hypothesis testing

07/25/2012
by   Dino Sejdinovic, et al.
0

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2012

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

We provide a unifying framework linking two classes of statistics used i...
research
10/26/2017

Energy Clustering

Energy statistics was proposed by Székely in the 80's inspired by the Ne...
research
06/10/2013

A Kernel Test for Three-Variable Interactions

We introduce kernel nonparametric tests for Lancaster three-variable int...
research
02/11/2014

Equivalence of Kernel Machine Regression and Kernel Distance Covariance for Multidimensional Trait Association Studies

Associating genetic markers with a multidimensional phenotype is an impo...
research
06/14/2018

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

Distance-based methods, also called "energy statistics", are leading met...
research
01/03/2019

Energy distance and kernel mean embedding for two sample survival test

In this article a new family of tests is proposed for the comparison pro...
research
02/28/2020

Generalized Sliced Distances for Probability Distributions

Probability metrics have become an indispensable part of modern statisti...

Please sign up or login with your details

Forgot password? Click here to reset