
The generalization error of random features regression: Precise asymptotics and double descent curve
Deep learning methods operate in regimes that defy the traditional stati...
read it

Understanding Double Descent Requires a FineGrained BiasVariance Decomposition
Classical learning theory suggests that the optimal generalization perfo...
read it

Triple descent and the two kinds of overfitting: Where why do they appear?
A recent line of research has highlighted the existence of a double desc...
read it

Generalisation error in learning with random features and the hidden manifold model
We study generalised linear regression and classification for a syntheti...
read it

A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
This article characterizes the exact asymptotics of random Fourier featu...
read it

What causes the test error? Going beyond biasvariance via ANOVA
Modern machine learning methods are often overparametrized, allowing ada...
read it

Avoiding The Double Descent Phenomenon of Random Feature Models Using Hybrid Regularization
We demonstrate the ability of hybrid regularization methods to automatic...
read it
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the Ucurve emblematic of the biasvariance tradeoff, their test error often follows a double descent  a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the socalled lazy learning regime of neural networks, by considering the problem of learning a highdimensional function with random features regression. We obtain a precise asymptotic expression for the biasvariance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond it which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following Geiger et al., we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensembling the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.
READ FULL TEXT
Comments
There are no comments yet.