Multi-scale Feature Learning Dynamics: Insights for Double Descent

12/06/2021
by   Mohammad Pezeshki, et al.
7

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

READ FULL TEXT

page 7

page 8

research
08/26/2021

When and how epochwise double descent happens

Deep neural networks are known to exhibit a `double descent' behavior as...
research
03/10/2023

Unifying Grokking and Double Descent

A principled understanding of generalization in deep learning may requir...
research
05/25/2023

Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon

In this paper, we studied two identically-trained neural networks (i.e. ...
research
10/22/2021

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Recent evidence has shown the existence of a so-called double-descent an...
research
01/30/2020

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

Extensive empirical evidence reveals that, for a wide range of different...
research
10/19/2020

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...
research
03/14/2022

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...

Please sign up or login with your details

Forgot password? Click here to reset