Implicit ridge regularization provided by the minimum-norm least squares estimator when n≪ p

05/28/2018
by   Dmitry Kobak, et al.
0

A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. This rule has been recently challenged by deep neural networks: despite being expressive enough to fit any training set perfectly, they still generalize well. Here we show that the same is true for linear regression in the under-determined n≪ p situation, provided that one uses the minimum-norm estimator. The case of linear model with least squares loss allows full and exact mathematical analysis. We prove that augmenting a model with many random covariates with small constant variance and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. Using toy example simulations as well as real-life high-dimensional data sets, we demonstrate that explicit ridge penalty often fails to provide any improvement over this implicit ridge regularization. In this regime, minimum-norm estimator achieves zero training error but nevertheless has low expected error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Exact expressions for double descent and implicit regularization via surrogate random design

Double descent refers to the phase transition that is exhibited by the g...
research
07/05/2023

The distribution of Ridgeless least squares interpolators

The Ridgeless minimum ℓ_2-norm interpolator in overparametrized linear r...
research
08/10/2021

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Stochastic gradient descent (SGD) exhibits strong algorithmic regulariza...
research
07/25/2020

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Modern neural networks are often operated in a strongly overparametrized...
research
07/19/2022

Lazy Estimation of Variable Importance for Large Neural Networks

As opaque predictive models increasingly impact many areas of modern lif...
research
03/11/2022

A geometrical viewpoint on the benign overfitting property of the minimum l_2-norm interpolant estimator

Practitioners have observed that some deep learning models generalize we...
research
09/01/2017

Sparse Regularization in Marketing and Economics

Sparse alpha-norm regularization has many data-rich applications in mark...

Please sign up or login with your details

Forgot password? Click here to reset