On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

08/16/2018
by   Dongruo Zhou, et al.
6

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been sufficiently studied. In this paper, we provide a sharp analysis of a recently proposed adaptive gradient method namely partially adaptive momentum estimation method (Padam) (Chen and Gu, 2018), which admits many existing adaptive gradient methods such as AdaGrad, RMSProp and AMSGrad as special cases. Our analysis shows that, for smooth nonconvex functions, Padam converges to a first-order stationary point at the rate of O((∑_i=1^dg_1:T,i_2)^1/2/T^3/4 + d/T), where T is the number of iterations, d is the dimension, g_1,...,g_T are the stochastic gradients, and g_1:T,i = [g_1,i,g_2,i,...,g_T,i]^. Our theoretical result also suggests that in order to achieve faster convergence rate, it is necessary to use Padam instead of AMSGrad. This is well-aligned with the empirical results of deep learning reported in Chen and Gu (2018).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Adaptive gradient methods have shown excellent performance for solving m...
research
05/20/2017

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

In this paper, we study and analyze the mini-batch version of StochAstic...
research
08/08/2018

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

This paper studies a class of adaptive gradient based momentum algorithm...
research
06/05/2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Adaptive gradient methods such as AdaGrad and its variants update the st...
research
01/26/2019

Escaping Saddle Points with Adaptive Gradient Methods

Adaptive methods such as Adam and RMSProp are widely used in deep learni...
research
03/29/2022

Convergence and Complexity of Stochastic Subgradient Methods with Dependent Data for Nonconvex Optimization

We show that under a general dependent data sampling scheme, the classic...
research
08/13/2022

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Adaptive gradient algorithms borrow the moving average idea of heavy bal...

Please sign up or login with your details

Forgot password? Click here to reset