Weighted AdaGrad with Unified Momentum

08/10/2018
by   Fangyu Zou, et al.
0

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as Nadam, AccAdaGrad, etc. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum, dubbed AdaUSM, which has the main characteristics that (1) it incorporates a unified momentum scheme which covers both the heavy ball momentum and the Nesterov accelerated gradient momentum; (2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its O((T)/√(T)) convergence rate in the non-convex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, which thereby provides a new perspesctive for understanding Adam and RMSProp. Lastly, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also provided.

READ FULL TEXT
research
08/10/2018

On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, ...
research
04/09/2023

μ^2-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective ...
research
10/30/2019

Understanding the Role of Momentum in Stochastic Gradient Methods

The use of momentum in stochastic gradient methods has become a widespre...
research
03/01/2023

A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data

Emerging distributed applications recently boosted the development of de...
research
11/01/2021

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

In this work we investigate stochastic non-convex optimization problems ...
research
06/07/2021

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

We analyze a class of stochastic gradient algorithms with momentum on a ...
research
11/23/2018

A Sufficient Condition for Convergences of Adam and RMSProp

Adam and RMSProp, as two of the most influential adaptive stochastic alg...

Please sign up or login with your details

Forgot password? Click here to reset