DeepAI AI Chat
Log In Sign Up

Optimal Regularization Can Mitigate Double Descent

by   Preetum Nakkiran, et al.

Recent empirical and theoretical studies have shown that many learning algorithms – from linear regression to neural networks – can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned ℓ_2 regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned ℓ_2 regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.


page 1

page 2

page 3

page 4


Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of b...

A Geometric Look at Double Descent Risk: Volumes, Singularities, and Distinguishabilities

The appearance of the double-descent risk phenomenon has received growin...

Avoiding The Double Descent Phenomenon of Random Feature Models Using Hybrid Regularization

We demonstrate the ability of hybrid regularization methods to automatic...

Risk-Monotonicity in Statistical Learning

Acquisition of data is a difficult task in many applications of machine ...

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

The quality of many modern machine learning models improves as model com...

Sketched Ridgeless Linear Regression: The Role of Downsampling

Overparametrization often helps improve the generalization performance. ...

An Econometric View of Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but compute...

Code Repositories