Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent

02/02/2023
by   Avrajit Ghosh, et al.
0

It is well known that the finite step-size (h) in Gradient Descent (GD) implicitly regularizes solutions to flatter minima. A natural question to ask is "Does the momentum parameter β play a role in implicit regularization in Heavy-ball (H.B) momentum accelerated gradient descent (GD+M)?". To answer this question, first, we show that the discrete H.B momentum update (GD+M) follows a continuous trajectory induced by a modified loss, which consists of an original loss and an implicit regularizer. Then, we show that this implicit regularizer for (GD+M) is stronger than that of (GD) by factor of (1+β/1-β), thus explaining why (GD+M) shows better generalization performance and higher test accuracy than (GD). Furthermore, we extend our analysis to the stochastic version of gradient descent with momentum (SGD+M) and characterize the continuous trajectory of the update of (SGD+M) in a pointwise sense. We explore the implicit regularization in (SGD+M) and (GD+M) through a series of experiments validating our theory.

READ FULL TEXT
research
09/23/2020

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural netw...
research
05/30/2019

Implicit Regularization of Accelerated Methods in Hilbert Spaces

We study learning properties of accelerated gradient descent methods for...
research
03/15/2018

On the insufficiency of existing momentum schemes for Stochastic Optimization

Momentum based stochastic gradient methods such as heavy ball (HB) and N...
research
01/20/2022

Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization

Acceleration and momentum are the de facto standard in modern applicatio...
research
06/10/2019

Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Most modern learning problems are highly overparameterized, meaning that...
research
08/31/2023

On the Implicit Bias of Adam

In previous literature, backward error analysis was used to find ordinar...
research
10/08/2021

Momentum Doesn't Change the Implicit Bias

The momentum acceleration technique is widely adopted in many optimizati...

Please sign up or login with your details

Forgot password? Click here to reset