The Implicit Bias of AdaGrad on Separable Data

06/09/2019
by   Qian Qian, et al.
0

We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper understanding of why adaptive methods do not seem to have the generalization ability as good as gradient descent does in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

Characterizing Implicit Bias in Terms of Optimization Geometry

We study the bias of generic optimization methods, including Mirror Desc...
research
02/27/2022

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

An influential line of recent work has focused on the generalization pro...
research
06/24/2023

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of deep neural networks, there has be...
research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
05/19/2023

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Recent research has observed that in machine learning optimization, grad...
research
05/25/2022

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Driven by the empirical success and wide use of deep neural networks, un...
research
05/27/2023

Faster Margin Maximization Rates for Generic Optimization Methods

First-order optimization methods tend to inherently favor certain soluti...

Please sign up or login with your details

Forgot password? Click here to reset