The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

by   Ben Adlam, et al.

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a double descent curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has non-monotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.



page 1

page 2

page 3

page 4


The generalization error of random features regression: Precise asymptotics and double descent curve

Deep learning methods operate in regimes that defy the traditional stati...

Contrasting random and learned features in deep Bayesian linear regression

Understanding how feature learning affects generalization is among the f...

Big Neural Networks Waste Capacity

This article exposes the failure of some big neural networks to leverage...

Beyond the storage capacity: data driven satisfiability transition

Data structure has a dramatic impact on the properties of neural network...

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators -- estimators that achieve zero training error -- have att...

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

The rapid recent progress in machine learning (ML) has raised a number o...

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.