SGEM: stochastic gradient with energy and momentum

08/03/2022
by   Hailiang Liu, et al.
0

In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2018

On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, Ada...
research
03/23/2022

An Adaptive Gradient Method with Energy and Momentum

We introduce a novel algorithm for gradient-based optimization of stocha...
research
10/01/2021

Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

Momentum method has been used extensively in optimizers for deep learnin...
research
02/12/2022

From Online Optimization to PID Controllers: Mirror Descent with Momentum

We study a family of first-order methods with momentum based on mirror d...
research
10/05/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks part...
research
02/14/2018

Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely ap...
research
06/04/2018

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD...

Please sign up or login with your details

Forgot password? Click here to reset