Randomized incomplete U-statistics in high dimensions

12/03/2017
by   Xiaohui Chen, et al.
0

This paper studies inference for the mean vector of a high-dimensional U-statistic. In the era of Big Data, the dimension d of the U-statistic and the sample size n of the observations tend to be both large, and the computation of the U-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for U-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete U-statistics obtained by sampling fewer terms of the U-statistic are attractive alternatives. In this paper, we introduce randomized incomplete U-statistics with sparse weights whose computational cost can be made independent of the order of the U-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete U-statistics in high dimensions, namely in cases where the dimension d is possibly much larger than the sample size n, for both non-degenerate and degenerate kernels. In addition, we propose novel and generic bootstrap methods for the incomplete U-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. The proposed bootstrap methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2019

Improved Central Limit Theorem and bootstrap approximations in high dimensions

This paper deals with the Gaussian and bootstrap approximations to the d...
research
01/04/2019

Approximating high-dimensional infinite-order U-statistics: statistical and computational guarantees

We study the problem of distributional approximations to high-dimensiona...
research
08/24/2022

Testing Many and Possibly Singular Polynomial Constraints

We consider the problem of testing a null hypothesis defined by polynomi...
research
08/10/2020

Design based incomplete U-statistics

U-statistics are widely used in fields such as economics, machine learni...
research
07/07/2022

Exponential finite sample bounds for incomplete U-statistics

Incomplete U-statistics have been proposed to accelerate computation. Th...
research
11/10/2020

Dimension-agnostic inference

Classical asymptotic theory for statistical inference usually involves c...
research
10/30/2020

Parametric bootstrap inference for stratified models with high-dimensional nuisance specifications

Inference about a scalar parameter of interest typically relies on the a...

Please sign up or login with your details

Forgot password? Click here to reset