Tractability from overparametrization: The example of the negative perceptron

by   Andrea Montanari, et al.

In the negative perceptron problem we are given n data points ( x_i,y_i), where x_i is a d-dimensional vector and y_i∈{+1,-1} is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector θ that maximizes min_i≤ ny_i⟨θ, x_i⟩. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which n,d→∞ with n/d→δ, and prove upper and lower bounds on the maximum margin κ_s(δ) or – equivalently – on its inverse function δ_s(κ). In other words, δ_s(κ) is the overparametrization threshold: for n/d≤δ_s(κ)-ε a classifier achieving vanishing training error exists with high probability, while for n/d≥δ_s(κ)+ε it does not. Our bounds on δ_s(κ) match to the leading order as κ→ -∞. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold δ_lin(κ). We observe a gap between the interpolation threshold δ_s(κ) and the linear programming threshold δ_lin(κ), raising the question of the behavior of other algorithms.


Tight bounds for maximum ℓ_1-margin classifiers

Popular iterative algorithms such as boosting methods and coordinate des...

Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming

In this paper, we study the predict-then-optimize problem where the outp...

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Modern machine learning models are often so complex that they achieve va...

On Accelerated Perceptrons and Beyond

The classical Perceptron algorithm of Rosenblatt can be used to find a l...

Error constant estimation under the maximum norm for linear Lagrange interpolation

For the Lagrange interpolation over a triangular domain, we propose an e...

Tropical Support Vector Machine and its Applications to Phylogenomics

Most data in genome-wide phylogenetic analysis (phylogenomics) is essent...

Critical Window of The Symmetric Perceptron

We study the critical window of the symmetric binary perceptron, or equi...

Please sign up or login with your details

Forgot password? Click here to reset