On the convergence proof of AMSGrad and a new version

04/07/2019
by   Tran Thi Phuong, et al.
0

The adaptive moment estimation algorithm Adam (Kingma and Ba, ICLR 2015) is a popular optimizer in the training of deep neural networks. However, Reddi et al. (ICLR 2018) have recently shown that the convergence proof of Adam is problematic and proposed a variant of Adam called AMSGrad as a fix. In this paper, we show that the convergence proof of AMSGrad is also problematic, and we present various fixes for it, which include a new version of AMSGrad.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2018

An improvement of the convergence proof of the ADAM-Optimizer

A common way to train neural networks is the Backpropagation. This algor...
research
05/19/2018

Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

First-order optimization methods have been playing a prominent role in d...
research
04/27/2023

Convergence of Adam Under Relaxed Assumptions

In this paper, we provide a rigorous proof of convergence of the Adaptiv...
research
03/04/2019

Optimistic Adaptive Acceleration for Optimization

We consider a new variant of AMSGrad. AMSGrad RKK18 is a popular adaptiv...
research
04/12/2017

A Proof of Orthogonal Double Machine Learning with Z-Estimators

We consider two stage estimation with a non-parametric first stage and a...
research
03/09/2020

Communication-Efficient Distributed SGD with Error-Feedback, Revisited

We show that the convergence proof of a recent algorithm called dist-EF-...
research
11/16/2021

On Bock's Conjecture Regarding the Adam Optimizer

In 2014, Kingma and Ba published their Adam optimizer algorithm, togethe...

Please sign up or login with your details

Forgot password? Click here to reset