A geometrical viewpoint on the benign overfitting property of the minimum l_2-norm interpolant estimator

03/11/2022
by   Guillaume Lecué, et al.
0

Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [5,45,44]. Since then many theoretical works have revealed some facets of this phenomenon [4,2,1,8] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator β̂ has received a lot of attention [1,39] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix Σ of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [39]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [2]: β̂ can be written as a sum of a ridge estimator β̂_1:k and an overfitting component β̂_k+1:p which follows a decomposition of the features space ℝ^p=V_1:k⊕^⊥ V_k+1:p into the space V_1:k spanned by the top k eigenvectors of Σ and the ones V_k+1:p spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [1,39] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_k+1:p where overfitting happens.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

Benign Overfitting in Linear Regression

The phenomenon of benign overfitting is one of the key mysteries uncover...
research
03/23/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

There is an increasing realization that algorithmic inductive biases are...
research
08/25/2021

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

The recent success of neural network models has shone light on a rather ...
research
05/28/2018

Implicit ridge regularization provided by the minimum-norm least squares estimator when n≪ p

A conventional wisdom in statistical learning is that large models requi...
research
02/12/2022

Relaxing the Feature Covariance Assumption: Time-Variant Bounds for Benign Overfitting in Linear Regression

Benign overfitting demonstrates that overparameterized models can perfor...
research
03/12/2020

Benign overfitting in the large deviation regime

We investigate the benign overfitting phenomenon in the large deviation ...
research
10/06/2021

Foolish Crowds Support Benign Overfitting

We prove a lower bound on the excess risk of sparse interpolating proced...

Please sign up or login with your details

Forgot password? Click here to reset