Dropout Drops Double Descent

05/25/2023
by   Tian-Le Yang, et al.
0

In this paper, we find and analyze that we can easily drop the double descent by only adding one dropout layer before the fully-connected linear layer. The surprising double-descent phenomenon has drawn public attention in recent years, making the prediction error rise and drop as we increase either sample or model size. The current paper shows that it is possible to alleviate these phenomena by using optimal dropout in the linear regression model and the nonlinear random feature regression, both theoretically and empirically. y=Xβ^0+ϵ with X∈ℝ^n× p. We obtain the optimal dropout hyperparameter by estimating the ground truth β^0 with generalized ridge typed estimator β̂=(X^TX+α·diag(X^TX))^-1X^Ty. Moreover, we empirically show that optimal dropout can achieve a monotonic test error curve in nonlinear neural networks using Fashion-MNIST and CIFAR-10. Our results suggest considering dropout for risk curve scaling when meeting the peak phenomenon. In addition, we figure out why previous deep learning models do not encounter double-descent scenarios – because we already apply a usual regularization approach like the dropout in our models. To our best knowledge, this paper is the first to analyze the relationship between dropout and double descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2020

Optimal Regularization Can Mitigate Double Descent

Recent empirical and theoretical studies have shown that many learning a...
research
03/24/2023

Double Descent Demystified: Identifying, Interpreting Ablating the Sources of a Deep Learning Puzzle

Double descent is a surprising phenomenon in machine learning, in which ...
research
02/26/2023

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of b...
research
06/18/2023

Dropout Regularization Versus ℓ_2-Penalization in the Linear Model

We investigate the statistical behavior of gradient descent iterates wit...
research
07/15/2023

Does Double Descent Occur in Self-Supervised Learning?

Most investigations into double descent have focused on supervised model...
research
08/31/2023

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning m...
research
03/06/2015

To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout

Training deep belief networks (DBNs) requires optimizing a non-convex fu...

Please sign up or login with your details

Forgot password? Click here to reset