DeepAI
Log In Sign Up

Regularization-wise double descent: Why it occurs and how to eliminate it

06/03/2022
by   Fatih Furkan Yilmaz, et al.
0

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the regularization strengths for the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18 trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and demonstrate that all exhibit double descent behavior as a function of the regularization strength.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/20/2020

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, in particular deep networks, often exhibit a ...
10/06/2020

Kernel regression in high dimension: Refined analysis beyond double descent

In this paper, we provide a precise characterize of generalization prope...
03/04/2020

Optimal Regularization Can Mitigate Double Descent

Recent empirical and theoretical studies have shown that many learning a...
08/17/2022

Superior generalization of smaller models in the presence of significant label noise

The benefits of over-parameterization in achieving superior generalizati...
02/28/2021

Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks

We investigate the asymptotic risk of a general class of overparameteriz...
10/07/2021

Double Descent in Adversarial Training: An Implicit Label Noise Perspective

Here, we show that the robust overfitting shall be viewed as the early p...
07/02/2021

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...