The Implicit Bias of Gradient Descent on Separable Data

10/27/2017
by   Daniel Soudry, et al.
0

We show that gradient descent on an unregularized logistic regression problem with separable data converges to the max-margin solution. The result generalizes also to other monotone decreasing loss functions with an infimum at infinity, and we also discuss a multi-class generalizations to the cross entropy loss. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

Gradient descent aligns the layers of deep linear networks

This paper establishes risk convergence and asymptotic weight matrix ali...
research
05/19/2023

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Recent research has observed that in machine learning optimization, grad...
research
06/13/2019

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Recent works on implicit regularization have shown that gradient descent...
research
03/29/2019

A proof of convergence of multi-class logistic regression network

This paper revisits the special type of a neural network known under two...
research
08/24/2023

Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy

Common explanations for shortcut learning assume that the shortcut impro...
research
09/15/2022

Decentralized Learning with Separable Data: Generalization and Fast Algorithms

Decentralized learning offers privacy and communication efficiency when ...
research
06/09/2019

The Implicit Bias of AdaGrad on Separable Data

We study the implicit bias of AdaGrad on separable linear classification...

Please sign up or login with your details

Forgot password? Click here to reset