On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks

08/10/2018
by   Fangyu Zou, et al.
0

Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, AdaDelta, Nadam, AMSGrad, etc., have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough results on AdaGrad ward2018adagrad and perturbed AdaGrad li2018convergence. In this paper, we propose two new adaptive stochastic gradient methods called AdaHB and AdaNAG which integrate coordinate-wise AdaGrad with heavy ball momentum and Nesterov accelerated gradient momentum, respectively. The O(T/√(T)) non-asymptotic convergence rates of AdaHB and AdaNAG in non-convex stochastic setting are also jointly characterized by leveraging a newly developed unified formulation of these two momentum mechanisms. In particular, when momentum term vanishes we obtain convergence rate of coordinate-wise AdaGrad in non-convex stochastic setting as a byproduct.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2018

On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, ...
research
08/03/2022

SGEM: stochastic gradient with energy and momentum

In this paper, we propose SGEM, Stochastic Gradient with Energy and Mome...
research
05/30/2022

Last-iterate convergence analysis of stochastic momentum methods for neural networks

The stochastic momentum method is a commonly used acceleration technique...
research
06/01/2020

Factorial Powers for Stochastic Optimization

The convergence rates for convex and non-convex optimization methods dep...
research
10/30/2019

Understanding the Role of Momentum in Stochastic Gradient Methods

The use of momentum in stochastic gradient methods has become a widespre...
research
02/13/2020

Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization

Stochastic gradient methods with momentum are widely used in application...
research
10/05/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks part...

Please sign up or login with your details

Forgot password? Click here to reset