Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ_2-norm interpolation in high-dimensional linear regression. Motivated by the connection with overparametrized neural networks, we consider the case of random features. We study two distinct models for the features' distribution: a linear model in which the feature vectors x_i∈ R^p are obtained by applying a linear transform to vectors of i.i.d. entries, x_i = Σ^1/2z_i (with z_i∈ R^p); a nonlinear model, in which the features are obtained by passing the input through a random one-layer neural network x_i = φ(Wz_i) (with z_i∈ R^d, and φ an activation function acting independently on the coordinates of Wz_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large scale neural networks and kernel machines, including the `double descent' behavior of the generalization error and the potential benefit of overparametrization.
READ FULL TEXT