The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

by   Andrea Montanari, et al.

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite this simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data (y_i, x_i), i< n with x_i∼ N( 0,Σ) a p-dimensional Gaussian feature vector, and y_i ∈{+1,-1} a label whose distribution depends on a linear combination of the covariates 〈θ_*, x_i〉. We consider the proportional asymptotics n,p→∞ with p/n→ψ, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when n,p are of the order of a few hundreds. We explore several choices for the the pair (θ_*,Σ), and show that the resulting generalization curve (test error error as a function of the overparametrization ratio ψ=p/n) is qualitatively different, depending on this choice. In particular we consider a specific structure of (θ_*,Σ) that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we observe that the test error is monotone decreasing in the number of parameters. This finding agrees with the recently developed `double descent' phenomenology for overparametrized models.


page 1

page 2

page 3

page 4


Large scale analysis of generalization error in learning using margin based classification methods

Large-margin classifiers are popular methods for classification. We deri...

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators -- estimators that achieve zero training error -- have att...

Good linear classifiers are abundant in the interpolating regime

Within the machine learning community, the widely-used uniform convergen...

Tractability from overparametrization: The example of the negative perceptron

In the negative perceptron problem we are given n data points ( x_i,y_i)...

High-dimensional dynamics of generalization error in neural networks

We perform an average case analysis of the generalization dynamics of la...

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Modern neural networks are often operated in a strongly overparametrized...

On the Optimality of Averaging in Distributed Statistical Learning

A common approach to statistical learning with big-data is to randomly s...

Please sign up or login with your details

Forgot password? Click here to reset