DeepAI AI Chat
Log In Sign Up

Generalized Kernel Two-Sample Tests

by   Hoseung Song, et al.

Kernel two-sample tests have been widely used for multivariate data in testing equal distribution. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space do not work well for some common alternatives when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of an informative pattern under moderate and high dimensions and achieves substantial power improvements over existing kernel two-sample tests for a wide range of alternatives. We also propose alternative testing procedures that maintain high power with low computational cost, offering easy off-the-shelf tools for large datasets. We illustrate these new approaches through an analysis of the New York City taxi data.


page 1

page 2

page 3

page 4


New graph-based multi-sample tests for high-dimensional and non-Euclidean data

Testing the equality in distributions of multiple samples is a common ta...

New kernel-based change-point detection

Change-point analysis plays a significant role in various fields to reve...

A Fast and Effective Large-Scale Two-Sample Test Based on Kernels

Kernel two-sample tests have been widely used and the development of eff...

An Optimal Witness Function for Two-Sample Testing

We propose data-dependent test statistics based on a one-dimensional wit...

Robust Multivariate Nonparametric Tests via Projection-Pursuit

In this work, we generalize the Cramér-von Mises statistic via projectio...

On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests

The reproducing kernel Hilbert space (RKHS) embedding of distributions o...

A Reproducing Kernel Hilbert Space log-rank test for the two-sample problem

Weighted log-rank tests are arguably the most widely used tests by pract...