
The Benefits of Implicit Regularization from SGD in Least Squares Problems
Stochastic gradient descent (SGD) exhibits strong algorithmic regulariza...
read it

Asymptotics of Ridge(less) Regression under General Source Condition
We analyze the prediction performance of ridge and ridgeless regression ...
read it

On the Optimal Weighted ℓ_2 Regularization in Overparameterized Linear Regression
We consider the linear model 𝐲 = 𝐗β_⋆ + ϵ with 𝐗∈ℝ^n× p in the overparam...
read it

Interpolation can hurt robust generalization even when there is no noise
Numerous recent works show that overparameterization implicitly reduces ...
read it

Does generalization performance of l^q regularization learning depend on q? A negative example
l^qregularization has been demonstrated to be an attractive technique i...
read it

Ridge Regularizaton: an Essential Concept in Data Science
Ridge or more formally ℓ_2 regularization shows up in many areas of stat...
read it

Harmless Overparametrization in Twolayer Neural Networks
Overparametrized neural networks, where the number of active parameters ...
read it
Benign overfitting in ridge regression
Classical learning theory suggests that strong regularization is needed to learn a class with large complexity. This intuition is in contrast with the modern practice of machine learning, in particular learning neural networks, where the number of parameters often exceeds the number of data points. It has been observed empirically that such overparametrized models can show good generalization performance even if trained with vanishing or negative regularization. The aim of this work is to understand theoretically how this effect can occur, by studying the setting of ridge regression. We provide nonasymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and show that those bounds are tight for a range of regularization parameter values. To our knowledge this is the first work that studies overparametrized ridge regression in such a general setting. We identify when small or negative regularization is sufficient for obtaining small generalization error. On the technical side, our bounds only require the data vectors to be i.i.d. subgaussian, while most previous work assumes independence of the components of those vectors.
READ FULL TEXT
Comments
There are no comments yet.