Sketched Ridgeless Linear Regression: The Role of Downsampling

02/02/2023
by   Xin Chen, et al.
0

Overparametrization often helps improve the generalization performance. This paper proposes a dual view of overparametrization suggesting that downsampling may also help generalize. Motivated by this dual view, we characterize two out-of-sample prediction risks of the sketched ridgeless least square estimator in the proportional regime m≍ n ≍ p, where m is the sketching size, n the sample size, and p the feature dimensionality. Our results reveal the statistical role of downsampling. Specifically, downsampling does not always hurt the generalization performance, and may actually help improve it in some cases. We identify the optimal sketching sizes that minimize the out-of-sample prediction risks, and find that the optimally sketched estimator has stabler risk curves that eliminates the peaks of those for the full-sample estimator. We then propose a practical procedure to empirically identify the optimal sketching size. Finally, we extend our results to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2023

Ensemble linear interpolators: The role of ensembling

Interpolators are unstable. For example, the mininum ℓ_2 norm least squa...
research
08/14/2020

Provable More Data Hurt in High Dimensional Least Squares Estimator

This paper investigates the finite-sample prediction risk of the high-di...
research
01/23/2019

Optimal Uncertainty Size in Distributionally Robust Inverse Covariance Estimation

In a recent paper, Nguyen, Kuhn, and Esfahani (2018) built a distributio...
research
06/25/2021

Implementation of an alternative method for assessing competing risks: restricted mean time lost

In clinical and epidemiological studies, hazard ratios are often applied...
research
03/23/2018

Nonparametric inference on Lévy measures of Lévy-driven Ornstein-Uhlenbeck processes under discrete observations

In this paper, we study nonparametric inference for a stationary Lévy-dr...
research
07/03/2019

An Econometric View of Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but compute...
research
07/03/2019

An Econometric Perspective of Algorithmic Sampling

Datasets that are terabytes in size are increasingly common, but compute...

Please sign up or login with your details

Forgot password? Click here to reset