The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

by   Andrea Montanari, et al.

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not induce overfitting. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic feature vectors in d dimensions, and N hidden neurons. Under the assumption N ≤ Cd (for C a constant), we show that the network can exactly interpolate the data as soon as the number of parameters is significantly larger than the number of samples: Nd≫ n. Under these assumptions, we show that the empirical NT kernel has minimum eigenvalue bounded away from zero, and characterize the generalization error of min-ℓ_2 norm interpolants, when the target function is linear. In particular, we show that the network approximately performs ridge regression in the raw features, with a strictly positive `self-induced' regularization.


page 1

page 2

page 3

page 4


The generalization error of random features regression: Precise asymptotics and double descent curve

Deep learning methods operate in regimes that defy the traditional stati...

Interpolation, extrapolation, and local generalization in common neural networks

There has been a long history of works showing that neural networks have...

Implicit ridge regularization provided by the minimum-norm least squares estimator when n≪ p

A conventional wisdom in statistical learning is that large models requi...

Limitations of Lazy Training of Two-layers Neural Networks

We study the supervised learning problem under either of the following t...

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Modern machine learning models are often so complex that they achieve va...

Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks

The performance of deep neural networks is often attributed to their aut...

A jamming transition from under- to over-parametrization affects loss landscape and generalization

We argue that in fully-connected networks a phase transition delimits th...

Please sign up or login with your details

Forgot password? Click here to reset