Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

05/19/2018
by   Haiwen Huang, et al.
0

First-order optimization methods have been playing a prominent role in deep learning. Algorithms such as RMSProp and Adam are rather popular in training deep neural networks on large datasets. Recently, Reddi et al. discovered a flaw in the proof of convergence of Adam, and the authors proposed an alternative algorithm, AMSGrad, which has guaranteed convergence under certain conditions. In this paper, we propose a new algorithm, called Nostalgic Adam (NosAdam), which places bigger weights on the past gradients than the recent gradients when designing the adaptive learning rate. This is a new observation made through mathematical analysis of the algorithm. We also show that the estimate of the second moment of the gradient in NosAdam vanishes slower than Adam, which may account for faster convergence of NosAdam. We analyze the convergence of NosAdam and discover a convergence rate that achieves the best known convergence rate O(1/√(T)) for general convex online learning problems. Empirically, we show that NosAdam outperforms AMSGrad and Adam in some common machine learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

The adaptive momentum method (AdaMM), which uses past gradients to updat...
research
10/18/2019

Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm

We study the problem of fitting task-specific learning rate schedules fr...
research
03/04/2019

Optimistic Adaptive Acceleration for Optimization

We consider a new variant of AMSGrad. AMSGrad RKK18 is a popular adaptiv...
research
05/31/2016

Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks

Adaptive learning rate algorithms such as RMSProp are widely used for tr...
research
04/07/2019

On the convergence proof of AMSGrad and a new version

The adaptive moment estimation algorithm Adam (Kingma and Ba, ICLR 2015)...
research
09/08/2018

Online Adaptive Methods, Universality and Acceleration

We present a novel method for convex unconstrained optimization that, wi...
research
10/27/2019

An Adaptive and Momental Bound Method for Stochastic Learning

Training deep neural networks requires intricate initialization and care...

Please sign up or login with your details

Forgot password? Click here to reset