Provable More Data Hurt in High Dimensional Least Squares Estimator

08/14/2020
by   Zeng Li, et al.
5

This paper investigates the finite-sample prediction risk of the high-dimensional least squares estimator. We derive the central limit theorem for the prediction risk when both the sample size and the number of features tend to infinity. Furthermore, the finite-sample distribution and the confidence interval of the prediction risk are provided. Our theoretical results demonstrate the sample-wise nonmonotonicity of the prediction risk and confirm "more data hurt" phenomenon.

READ FULL TEXT

page 2

page 19

page 20

page 21

page 22

page 23

research
06/18/2021

CLT for LSS of sample covariance matrices with unbounded dispersions

Under the high-dimensional setting that data dimension and sample size t...
research
09/05/2018

Conditional predictive inference for high-dimensional stable algorithms

We investigate generically applicable and intuitively appealing predicti...
research
05/03/2022

Causal Regularization: On the trade-off between in-sample risk and out-of-sample risk guarantees

In recent decades, a number of ways of dealing with causality in practic...
research
06/16/2020

Risk bounds when learning infinitely many response functions by ordinary linear regression

Consider the problem of learning a large number of response functions si...
research
02/02/2023

Sketched Ridgeless Linear Regression: The Role of Downsampling

Overparametrization often helps improve the generalization performance. ...
research
07/15/2020

Sketching for Two-Stage Least Squares Estimation

When there is so much data that they become a computation burden, it is ...
research
04/16/2023

Regression and Algorithmic Information Theory

In this paper we prove a theorem about regression, in that the shortest ...

Please sign up or login with your details

Forgot password? Click here to reset