Minimum ℓ_1-norm interpolators: Precise asymptotics and multiple descent

10/18/2021
by   Yue Li, et al.
0

An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators – the ones that achieve zero training error – may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum ℓ_1-norm interpolator, which is motivated by the observation that several learning algorithms favor low ℓ_1-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size). We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum ℓ_1-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum ℓ_1-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum ℓ_2-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/21/2022

Multiple Descent in the Multiple Random Feature Model

Recent works have demonstrated a double descent phenomenon in over-param...
10/16/2020

Failures of model-dependent generalization bounds for least-norm interpolation

We consider bounds on the generalization performance of the least-norm l...
03/18/2019

Two models of double descent for weak features

The "double descent" risk curve was recently proposed to qualitatively d...
12/10/2019

Exact expressions for double descent and implicit regularization via surrogate random design

Double descent refers to the phase transition that is exhibited by the g...
07/25/2020

A finite sample analysis of the double descent phenomenon for ridge function estimation

Recent extensive numerical experiments in high scale machine learning ha...
11/16/2018

Minimum norm solutions do not always generalize well for over-parameterized problems

Stochastic gradient descent is the de facto algorithm for training deep ...
06/19/2020

Information theoretic limits of learning a sparse rule

We consider generalized linear models in regimes where the number of non...