The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

11/05/2019
by   Andrea Montanari, et al.
34

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite this simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data (y_i, x_i), i< n with x_i∼ N( 0,Σ) a p-dimensional Gaussian feature vector, and y_i ∈{+1,-1} a label whose distribution depends on a linear combination of the covariates 〈θ_*, x_i〉. We consider the proportional asymptotics n,p→∞ with p/n→ψ, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when n,p are of the order of a few hundreds. We explore several choices for the the pair (θ_*,Σ), and show that the resulting generalization curve (test error error as a function of the overparametrization ratio ψ=p/n) is qualitatively different, depending on this choice. In particular we consider a specific structure of (θ_*,Σ) that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we observe that the test error is monotone decreasing in the number of parameters. This finding agrees with the recently developed `double descent' phenomenology for overparametrized models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset