Gradient descent follows the regularization path for general losses

06/19/2020
by   Ziwei Ji, et al.
0

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with arbitrary convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the algorithm-independent regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

Gradient descent aligns the layers of deep linear networks

This paper establishes risk convergence and asymptotic weight matrix ali...
research
03/13/2023

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

We provide a unified framework, applicable to a general family of convex...
research
12/24/2021

Is Importance Weighting Incompatible with Interpolating Classifiers?

Importance weighting is a classic technique to handle distribution shift...
research
06/30/2020

Gradient Methods Never Overfit On Separable Data

A line of recent works established that when training linear predictors ...
research
04/26/2022

Bias-Variance Decompositions for Margin Losses

We introduce a novel bias-variance decomposition for a range of strictly...
research
07/07/2021

An algorithmic view of ℓ_2 regularization and some path-following algorithms

We establish an equivalence between the ℓ_2-regularized solution path fo...
research
05/19/2023

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Recent research has observed that in machine learning optimization, grad...

Please sign up or login with your details

Forgot password? Click here to reset