Surprises in High-Dimensional Ridgeless Least Squares Interpolation

03/19/2019
by   Trevor Hastie, et al.
0

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ_2-norm interpolation in high-dimensional linear regression. Motivated by the connection with overparametrized neural networks, we consider the case of random features. We study two distinct models for the features' distribution: a linear model in which the feature vectors x_i∈ R^p are obtained by applying a linear transform to vectors of i.i.d. entries, x_i = Σ^1/2z_i (with z_i∈ R^p); a nonlinear model, in which the features are obtained by passing the input through a random one-layer neural network x_i = φ(Wz_i) (with z_i∈ R^d, and φ an activation function acting independently on the coordinates of Wz_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large scale neural networks and kernel machines, including the `double descent' behavior of the generalization error and the potential benefit of overparametrization.

READ FULL TEXT
research
08/14/2019

The generalization error of random features regression: Precise asymptotics and double descent curve

Deep learning methods operate in regimes that defy the traditional stati...
research
08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...
research
11/05/2019

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Modern machine learning models are often so complex that they achieve va...
research
12/16/2020

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Deep networks are typically trained with many more parameters than the s...
research
02/21/2020

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a syntheti...
research
06/05/2020

Triple descent and the two kinds of overfitting: Where why do they appear?

A recent line of research has highlighted the existence of a double desc...
research
12/02/2022

Vector Symbolic Finite State Machines in Attractor Neural Networks

Hopfield attractor networks are robust distributed models of human memor...

Please sign up or login with your details

Forgot password? Click here to reset