Can we trust the bootstrap in high-dimension?

08/02/2016
by   Noureddine El Karoui, et al.
0

We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where p<n but p/n is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of β? (where β is the true regression vector). We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression -- residual bootstrap and pairs bootstrap -- give very poor inference on β as the ratio p/n grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio p/n grows. We also show that the jackknife resampling technique for estimating the variance of β̂ severely overestimates the variance in high dimensions. We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the underlying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2019

On confidence intervals centered on bootstrap smoothed estimators

We assess the performance, in terms of coverage probability and expected...
research
02/22/2022

Resampling-free bootstrap inference for quantiles

Bootstrap inference is a powerful tool for obtaining robust inference fo...
research
10/20/2022

Finite-Sample Coverage Errors of the Cheap Bootstrap With Minimal Resampling Effort

The bootstrap is a popular data-driven method to quantify statistical un...
research
08/18/2022

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Accurate statistical inference in logistic regression models remains a c...
research
07/31/2020

Slightly Conservative Bootstrap for Maxima of Sums

We study the bootstrap for the maxima of the sums of independent random ...
research
08/14/2021

Equity-Directed Bootstrapping: Examples and Analysis

When faced with severely imbalanced binary classification problems, we o...
research
07/04/2016

Bootstrap Model Aggregation for Distributed Statistical Learning

In distributed, or privacy-preserving learning, we are often given a set...

Please sign up or login with your details

Forgot password? Click here to reset