Analysis Of Momentum Methods

06/10/2019
by   Nikola B. Kovachki, et al.
4

Gradient decent-based optimization methods underpin the parameter training which results in the impressive results now found when testing neural networks. Introducing stochasticity is key to their success in practical problems, and there is some understanding of the role of stochastic gradient decent in this context. Momentum modifications of gradient decent such as Polyak's Heavy Ball method (HB) and Nesterov's method of accelerated gradients (NAG), are widely adopted. In this work, our focus is on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm; to expose the ideas simply we work in the deterministic setting. We show that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate. We achieve this by showing that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis. Further we show that the momentum method admits an exponentially attractive invariant manifold on which the dynamic reduces to a gradient flow with respect to a modified loss function, equal to the original one plus a small perturbation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2019

Understanding the Role of Momentum in Stochastic Gradient Methods

The use of momentum in stochastic gradient methods has become a widespre...
research
04/12/2016

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, stochastic momentum methods have been widely adopted in train...
research
12/20/2017

ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

Two major momentum-based techniques that have achieved tremendous succes...
research
09/12/2018

On the Stability and Convergence of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with the stochastic gradien...
research
06/04/2021

A Discrete Variational Derivation of Accelerated Methods in Optimization

Many of the new developments in machine learning are connected with grad...
research
04/14/2023

Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics

The highly structured energy landscape of the loss as a function of para...

Please sign up or login with your details

Forgot password? Click here to reset