A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

02/11/2023
by   Kevin H. Huang, et al.
0

We prove a convergence theorem for U-statistics of degree two, where the data dimension d is allowed to scale with sample size n. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite n and d, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with d and the bandwidth.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2021

Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

Motivated by the increasing use of kernel-based metrics for high-dimensi...
research
05/10/2022

Entropic CLT for Order Statistics

It is well known that central order statistics exhibit a central limit b...
research
05/10/2018

Wald Statistics in high-dimensional PCA

In this note we consider PCA for Gaussian observations X_1,..., X_n with...
research
06/14/2021

Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Distance covariance is a popular dependence measure for two random vecto...
research
12/17/2021

Gaussian RBF Centered Kernel Alignment (CKA) in the Large Bandwidth Limit

We prove that Centered Kernel Alignment (CKA) based on a Gaussian RBF ke...
research
06/06/2018

Optimal Inference with a Multidimensional Multiscale Statistic

We observe a stochastic process Y on [0,1]^d (d≥ 1) satisfying dY(t)=n^1...
research
10/05/2022

A uniform kernel trick for high-dimensional two-sample problems

We use a suitable version of the so-called "kernel trick" to devise two-...

Please sign up or login with your details

Forgot password? Click here to reset