Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

03/02/2020
by   Stéphane d'Ascoli, et al.
19

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a double descent - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond it which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following Geiger et al., we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensembling the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

The generalization error of random features regression: Precise asymptotics and double descent curve

Deep learning methods operate in regimes that defy the traditional stati...
research
01/31/2022

Fluctuations, Bias, Variance Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

From the sampling of data to the initialisation of parameters, randomnes...
research
06/05/2020

Triple descent and the two kinds of overfitting: Where why do they appear?

A recent line of research has highlighted the existence of a double desc...
research
02/21/2020

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a syntheti...
research
03/14/2022

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...
research
05/31/2022

VC Theoretical Explanation of Double Descent

There has been growing interest in generalization performance of large m...
research
10/11/2020

What causes the test error? Going beyond bias-variance via ANOVA

Modern machine learning methods are often overparametrized, allowing ada...

Please sign up or login with your details

Forgot password? Click here to reset