On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

08/10/2018
by   Fangyu Zou, et al.
0

Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, Adam, AMSGrad, etc., have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough results on AdaGrad, perturbed AdaGrad and AMSGrad. In this paper, we propose two new adaptive stochastic gradient methods called AdaHB and AdaNAG which integrate a novel weighted coordinate-wise AdaGrad with heavy ball momentum and Nesterov accelerated gradient momentum, respectively. The O(T/√(T)) non-asymptotic convergence rates of AdaHB and AdaNAG in non-convex stochastic setting are also jointly established by leveraging a newly developed unified formulation of these two momentum mechanisms. Moreover, comparisons have been made between AdaHB, AdaNAG, Adam and RMSProp, which, to a certain extent, explains the reasons why Adam and RMSProp are divergent. In particular, when momentum term vanishes we obtain convergence rate of coordinate-wise AdaGrad in non-convex stochastic setting as a byproduct.

READ FULL TEXT
research
08/10/2018

On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, Ada...
research
08/10/2018

Weighted AdaGrad with Unified Momentum

Integrating adaptive learning rate and momentum techniques into SGD lead...
research
12/07/2020

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

In this paper, a general stochastic optimization procedure is studied, u...
research
08/30/2018

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic momentum methods have been widely adopted in training deep ne...
research
09/08/2022

Losing momentum in continuous-time stochastic optimisation

The training of deep neural networks and other modern machine learning m...
research
07/02/2021

Momentum Accelerates the Convergence of Stochastic AUPRC Maximization

In this paper, we study stochastic optimization of areas under precision...
research
04/14/2023

Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics

The highly structured energy landscape of the loss as a function of para...

Please sign up or login with your details

Forgot password? Click here to reset