Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

05/19/2023
by   Jingfeng Wu, et al.
0

Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen, et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD, which are only applicable when the stepsizes are sufficiently small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
03/05/2018

Convergence of Gradient Descent on Separable Data

The implicit bias of gradient descent is not fully understood even in si...
research
03/20/2018

Risk and parameter convergence of logistic regression

The logistic loss is strictly convex and does not attain its infimum; co...
research
06/19/2020

Gradient descent follows the regularization path for general losses

Recent work across many machine learning disciplines has highlighted tha...
research
11/13/2019

A Model of Double Descent for High-dimensional Binary Linear Classification

We consider a model for logistic regression where only a subset of featu...
research
06/09/2019

The Implicit Bias of AdaGrad on Separable Data

We study the implicit bias of AdaGrad on separable linear classification...
research
11/18/2015

A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation

We examine a new form of smooth approximation to the zero one loss in wh...

Please sign up or login with your details

Forgot password? Click here to reset