The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

08/15/2020
by   Ben Adlam, et al.
0

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a double descent curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has non-monotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

The generalization error of random features regression: Precise asymptotics and double descent curve

Deep learning methods operate in regimes that defy the traditional stati...
research
03/01/2022

Contrasting random and learned features in deep Bayesian linear regression

Understanding how feature learning affects generalization is among the f...
research
07/23/2022

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

In this work we establish an algorithm and distribution independent non-...
research
01/16/2013

Big Neural Networks Waste Capacity

This article exposes the failure of some big neural networks to leverage...
research
05/20/2020

Beyond the storage capacity: data driven satisfiability transition

Data structure has a dramatic impact on the properties of neural network...
research
03/19/2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators -- estimators that achieve zero training error -- have att...
research
09/06/2021

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

The rapid recent progress in machine learning (ML) has raised a number o...

Please sign up or login with your details

Forgot password? Click here to reset