Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

10/01/2021
by   Guojing Cong, et al.
0

Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2022

SGEM: stochastic gradient with energy and momentum

In this paper, we propose SGEM, Stochastic Gradient with Energy and Mome...
research
04/07/2023

Echo disappears: momentum term structure and cyclic information in turnover

We extract cyclic information in turnover and find it can explain the mo...
research
10/16/2018

Quasi-hyperbolic momentum and Adam for deep learning

Momentum-based acceleration of stochastic gradient descent (SGD) is wide...
research
07/05/2020

Momentum Accelerates Evolutionary Dynamics

We combine momentum from machine learning with evolutionary dynamics, wh...
research
06/04/2018

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD...
research
07/26/2019

Taming Momentum in a Distributed Asynchronous Environment

Although distributed computing can significantly reduce the training tim...
research
10/21/2019

Momentum in Reinforcement Learning

We adapt the optimization's concept of momentum to reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset