Optimal Regularization Can Mitigate Double Descent

03/04/2020
by   Preetum Nakkiran, et al.
13

Recent empirical and theoretical studies have shown that many learning algorithms – from linear regression to neural networks – can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned ℓ_2 regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned ℓ_2 regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Dropout Drops Double Descent

In this paper, we find and analyze that we can easily drop the double de...
research
12/11/2020

Avoiding The Double Descent Phenomenon of Random Feature Models Using Hybrid Regularization

We demonstrate the ability of hybrid regularization methods to automatic...
research
11/28/2020

Risk-Monotonicity in Statistical Learning

Acquisition of data is a difficult task in many applications of machine ...
research
10/14/2022

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

The quality of many modern machine learning models improves as model com...
research
06/08/2020

A Geometric Look at Double Descent Risk: Volumes, Singularities, and Distinguishabilities

The appearance of the double-descent risk phenomenon has received growin...
research
07/03/2019

An Econometric View of Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but compute...
research
07/03/2019

An Econometric Perspective of Algorithmic Sampling

Datasets that are terabytes in size are increasingly common, but compute...

Please sign up or login with your details

Forgot password? Click here to reset