Momentum Doesn't Change the Implicit Bias

10/08/2021
by   Bohan Wang, et al.
0

The momentum acceleration technique is widely adopted in many optimization algorithms. However, the theoretical understanding of how the momentum affects the generalization performance of the optimization algorithms is still unknown. In this paper, we answer this question through analyzing the implicit bias of momentum-based optimization. We prove that both SGD with momentum and Adam converge to the L_2 max-margin solution for exponential-tailed loss, which is the same as vanilla gradient descent. That means, these optimizers with momentum acceleration still converge to a model with low complexity, which provides guarantees on their generalization. Technically, to overcome the difficulty brought by the error accumulation in analyzing the momentum, we construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Quasi-hyperbolic momentum and Adam for deep learning

Momentum-based acceleration of stochastic gradient descent (SGD) is wide...
research
07/01/2021

Fast Margin Maximization via Dual Acceleration

We present and analyze a momentum-based gradient method for training lin...
research
10/28/2022

Flatter, faster: scaling momentum for optimal speedup of SGD

Commonly used optimization algorithms often show a trade-off between goo...
research
02/02/2023

Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent

It is well known that the finite step-size (h) in Gradient Descent (GD) ...
research
07/13/2022

Towards understanding how momentum improves generalization in deep learning

Stochastic gradient descent (SGD) with momentum is widely used for train...
research
05/27/2023

Faster Margin Maximization Rates for Generic Optimization Methods

First-order optimization methods tend to inherently favor certain soluti...
research
02/22/2018

Characterizing Implicit Bias in Terms of Optimization Geometry

We study the bias of generic optimization methods, including Mirror Desc...

Please sign up or login with your details

Forgot password? Click here to reset