Minimum ℓ_1-norm interpolators: Precise asymptotics and multiple descent

10/18/2021
by   Yue Li, et al.
0

An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators – the ones that achieve zero training error – may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum ℓ_1-norm interpolator, which is motivated by the observation that several learning algorithms favor low ℓ_1-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size). We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum ℓ_1-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum ℓ_1-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum ℓ_2-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

Learning algorithms that divide the data into batches are prevalent in m...
research
08/21/2022

Multiple Descent in the Multiple Random Feature Model

Recent works have demonstrated a double descent phenomenon in over-param...
research
10/16/2020

Failures of model-dependent generalization bounds for least-norm interpolation

We consider bounds on the generalization performance of the least-norm l...
research
03/18/2019

Two models of double descent for weak features

The "double descent" risk curve was recently proposed to qualitatively d...
research
12/10/2019

Exact expressions for double descent and implicit regularization via surrogate random design

Double descent refers to the phase transition that is exhibited by the g...
research
07/25/2020

A finite sample analysis of the double descent phenomenon for ridge function estimation

Recent extensive numerical experiments in high scale machine learning ha...
research
09/19/2022

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

We bound the excess risk of interpolating deep linear networks trained u...

Please sign up or login with your details

Forgot password? Click here to reset