Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

10/22/2021
by   Antoine Bodin, et al.
5

Recent evidence has shown the existence of a so-called double-descent and even triple-descent behavior for the generalization error of deep-learning models. This important phenomenon commonly appears in implemented neural network architectures, and also seems to emerge in epoch-wise curves during the training process. A recent line of research has highlighted that random matrix tools can be used to obtain precise analytical asymptotics of the generalization (and training) errors of the random feature model. In this contribution, we analyze the whole temporal behavior of the generalization and training errors under gradient flow for the random feature model. We show that in the asymptotic limit of large system size the full time-evolution path of both errors can be calculated analytically. This allows us to observe how the double and triple descents develop over time, if and when early stopping is an option, and also observe time-wise descent structures. Our techniques are based on Cauchy complex integral representations of the errors together with recent random matrix methods based on linear pencils.

READ FULL TEXT

page 7

page 8

page 33

page 34

12/06/2021

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is...
08/13/2020

The Slow Deterioration of the Generalization Error of the Random Feature Model

The random feature model exhibits a kind of resonance behavior when the ...
07/20/2020

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, in particular deep networks, often exhibit a ...
08/26/2021

When and how epochwise double descent happens

Deep neural networks are known to exhibit a `double descent' behavior as...
03/01/2022

Contrasting random and learned features in deep Bayesian linear regression

Understanding how feature learning affects generalization is among the f...
08/27/2020

A Precise Performance Analysis of Learning with Random Features

We study the problem of learning an unknown function using random featur...
07/27/2021

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

State-of-the-art neural network architectures continue to scale in size ...