Deep Double Descent: Where Bigger Models and More Data Hurt

12/04/2019
by   Preetum Nakkiran, et al.
Harvard University
13

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.

READ FULL TEXT

page 2

page 7

page 9

page 17

page 19

page 20

page 21

page 22

07/02/2021

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...
06/17/2022

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

People usually believe that network pruning not only reduces the computa...
05/27/2023

Learning Capacity: A Measure of the Effective Dimensionality of a Model

We exploit a formal correspondence between thermodynamics and inference,...
03/10/2023

Unifying Grokking and Double Descent

A principled understanding of generalization in deep learning may requir...
03/04/2020

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Neural networks appear to have mysterious generalization properties when...
12/16/2020

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Deep networks are typically trained with many more parameters than the s...
10/14/2022

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

The quality of many modern machine learning models improves as model com...

Code Repositories

nlp-sentiment-analysis

Understanding user comments via natural language processing with TensorFlow and Scikit-Learn


view repo

Deep-Double-Descent-Where-Bigger-Models-and-More-Data-Hurts

This project is best on the homonymous paper


view repo

double-descent

We investigate double descent more deeply and try to precisely characterize the phenomenon under different settings. Specifically, we focus on the impact of label noise and regularization on double descent. None of the existing works consider these aspects in detail and we hypothesize that these play an integral role in double descent.


view repo

Please sign up or login with your details

Forgot password? Click here to reset