ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

12/20/2017
by   Vishwak Srinivasan, et al.
0

Two major momentum-based techniques that have achieved tremendous success in optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all momentum-based methods is the choice of the momentum parameter m which is always suggested to be set to less than 1. Although the choice of m < 1 is justified only under very strong theoretical assumptions, it works well in practice even when the assumptions do not necessarily hold. In this paper, we propose a new momentum based method ADINE, which relaxes the constraint of m < 1 and allows the learning algorithm to use adaptive higher momentum. We motivate our hypothesis on m by experimentally verifying that a higher momentum (> 1) can help escape saddles much faster. Using this motivation, we propose our method ADINE that helps weigh the previous updates more (by setting the momentum parameter > 1), evaluate our proposed algorithm on deep neural networks and show that ADINE helps the learning algorithm to converge much faster without compromising on the generalization error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Heavy ball momentum is crucial in accelerating (stochastic) gradient-bas...
research
12/03/2020

Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum

Momentum plays a crucial role in stochastic gradient-based optimization ...
research
04/12/2016

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, stochastic momentum methods have been widely adopted in train...
research
06/10/2019

Analysis Of Momentum Methods

Gradient decent-based optimization methods underpin the parameter traini...
research
04/05/2022

Gradient Descent Bit-Flipping Decoding with Momentum

In this paper, we propose a Gradient Descent Bit-Flipping (GDBF) decodin...
research
10/09/2020

Adaptive and Momentum Methods on Manifolds Through Trivializations

Adaptive methods do not have a direct generalization to manifolds as the...
research
04/10/2021

Fairly Constricted Particle Swarm Optimization

We have adapted the use of exponentially averaged momentum in PSO to mul...

Please sign up or login with your details

Forgot password? Click here to reset