The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

11/05/2019
by   Andrea Montanari, et al.
34

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite this simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data (y_i, x_i), i< n with x_i∼ N( 0,Σ) a p-dimensional Gaussian feature vector, and y_i ∈{+1,-1} a label whose distribution depends on a linear combination of the covariates 〈θ_*, x_i〉. We consider the proportional asymptotics n,p→∞ with p/n→ψ, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when n,p are of the order of a few hundreds. We explore several choices for the the pair (θ_*,Σ), and show that the resulting generalization curve (test error error as a function of the overparametrization ratio ψ=p/n) is qualitatively different, depending on this choice. In particular we consider a specific structure of (θ_*,Σ) that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we observe that the test error is monotone decreasing in the number of parameters. This finding agrees with the recently developed `double descent' phenomenology for overparametrized models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Large scale analysis of generalization error in learning using margin based classification methods

Large-margin classifiers are popular methods for classification. We deri...
research
03/19/2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators -- estimators that achieve zero training error -- have att...
research
06/22/2020

Good linear classifiers are abundant in the interpolating regime

Within the machine learning community, the widely-used uniform convergen...
research
10/28/2021

Tractability from overparametrization: The example of the negative perceptron

In the negative perceptron problem we are given n data points ( x_i,y_i)...
research
10/10/2017

High-dimensional dynamics of generalization error in neural networks

We perform an average case analysis of the generalization dynamics of la...
research
07/25/2020

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Modern neural networks are often operated in a strongly overparametrized...
research
07/10/2014

On the Optimality of Averaging in Distributed Statistical Learning

A common approach to statistical learning with big-data is to randomly s...

Please sign up or login with your details

Forgot password? Click here to reset