Deep Double Descent: Where Bigger Models and More Data Hurt

by   Preetum Nakkiran, et al.
Harvard University

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.


page 2

page 7

page 9

page 17

page 19

page 20

page 21

page 22


Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

People usually believe that network pruning not only reduces the computa...

Learning Capacity: A Measure of the Effective Dimensionality of a Model

We exploit a formal correspondence between thermodynamics and inference,...

Unifying Grokking and Double Descent

A principled understanding of generalization in deep learning may requir...

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Neural networks appear to have mysterious generalization properties when...

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Deep networks are typically trained with many more parameters than the s...

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

The quality of many modern machine learning models improves as model com...

Code Repositories


Understanding user comments via natural language processing with TensorFlow and Scikit-Learn

view repo


This project is best on the homonymous paper

view repo


We investigate double descent more deeply and try to precisely characterize the phenomenon under different settings. Specifically, we focus on the impact of label noise and regularization on double descent. None of the existing works consider these aspects in detail and we hypothesize that these play an integral role in double descent.

view repo

Please sign up or login with your details

Forgot password? Click here to reset