On the convergence proof of AMSGrad and a new version

by   Tran Thi Phuong, et al.

The adaptive moment estimation algorithm Adam (Kingma and Ba, ICLR 2015) is a popular optimizer in the training of deep neural networks. However, Reddi et al. (ICLR 2018) have recently shown that the convergence proof of Adam is problematic and proposed a variant of Adam called AMSGrad as a fix. In this paper, we show that the convergence proof of AMSGrad is also problematic, and we present various fixes for it, which include a new version of AMSGrad.


Please sign up or login with your details

Forgot password? Click here to reset