The distribution of Ridgeless least squares interpolators

07/05/2023
by   Qiyang Han, et al.
0

The Ridgeless minimum ℓ_2-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years. While it seems to defy the conventional wisdom that overfitting leads to poor prediction, recent research reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This renders the Ridgeless interpolator a theoretically tractable proxy that offers useful insights into the mechanisms of modern machine learning methods. This paper takes a different perspective that aims at understanding the precise stochastic behavior of the Ridgeless interpolator as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which plays the role of the prescribed implicit regularization in the context of prediction risk. Our distributional characterizations hold for general random designs and extend uniformly to positively regularized Ridge estimators. As a demonstration of the analytic power of these characterizations, we derive approximate formulae for a general class of weighted ℓ_q risks for Ridge(less) estimators that were previously available only for ℓ_2. Our theory also provides certain further conceptual reconciliation with the conventional wisdom: given any data covariance, a certain amount of regularization in Ridge regression remains beneficial for `most' signals across various statistical tasks including prediction, estimation and inference, as long as the noise level is non-trivial. Surprisingly, optimal tuning can be achieved simultaneously for all the designated statistical tasks by a single generalized or k-fold cross-validation scheme, despite being designed specifically for tuning prediction risk.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2018

Implicit ridge regularization provided by the minimum-norm least squares estimator when n≪ p

A conventional wisdom in statistical learning is that large models requi...
research
06/10/2020

On the Optimal Weighted ℓ_2 Regularization in Overparameterized Linear Regression

We consider the linear model 𝐲 = 𝐗β_⋆ + ϵ with 𝐗∈ℝ^n× p in the overparam...
research
06/16/2020

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Empirical Risk Minimization (ERM) algorithms are widely used in a variet...
research
08/05/2021

Interpolation can hurt robust generalization even when there is no noise

Numerous recent works show that overparameterization implicitly reduces ...
research
03/31/2017

The Risk of Machine Learning

Many applied settings in empirical economics involve simultaneous estima...
research
05/29/2023

Generalized equivalences between subsampling and ridge regularization

We establish precise structural and risk equivalences between subsamplin...
research
07/20/2022

Provably tuning the ElasticNet across instances

An important unresolved challenge in the theory of regularization is to ...

Please sign up or login with your details

Forgot password? Click here to reset