Rapidly Adapting Moment Estimation

02/24/2019
by   Guoqiang Zhang, et al.
0

Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we make use of the most recent first moment of gradients to compute the individual learning rates per iteration. The motivation behind it is that the dynamic variation of the first moment of gradients may provide useful information to obtain the learning rates. We refer to the new method as the rapidly adapting moment estimation (RAME). The theoretical convergence of deterministic RAME is studied by using an analysis similar to the one used in [1] for Adam. Experimental results for training a number of DNNs show promising performance of RAME w.r.t. the convergence speed and generalization performance compared to the stochastic heavy-ball (SHB) method, Adam, and RMSprop.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2018

Double Adaptive Stochastic Gradient Optimization

Adaptive moment methods have been remarkably successful in deep learning...
research
03/24/2022

On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks

Adam and AdaBelief compute and make use of elementwise adaptive stepsize...
research
10/27/2019

An Adaptive and Momental Bound Method for Stochastic Learning

Training deep neural networks requires intricate initialization and care...
research
06/22/2021

Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization

Adaptive gradient methods, such as Adam, have achieved tremendous succes...
research
06/12/2020

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Due to its simplicity and outstanding ability to generalize, stochastic ...
research
12/05/2022

Rethinking the Structure of Stochastic Gradients: Empirical and Statistical Evidence

Stochastic gradients closely relate to both optimization and generalizat...
research
10/02/2019

Accelerating Deep Learning by Focusing on the Biggest Losers

This paper introduces Selective-Backprop, a technique that accelerates t...

Please sign up or login with your details

Forgot password? Click here to reset