Gradient Energy Matching for Distributed Asynchronous Gradient Descent

05/22/2018
by   Joeri Hermans, et al.
0

Distributed asynchronous SGD has become widely used for deep learning in large-scale systems, but remains notorious for its instability when increasing the number of workers. In this work, we study the dynamics of distributed asynchronous SGD under the lens of Lagrangian mechanics. Using this description, we introduce the concept of energy to describe the optimization process and derive a sufficient condition ensuring its stability as long as the collective energy induced by the active workers remains below the energy of a target synchronous process. Making use of this criterion, we derive a stable distributed asynchronous optimization procedure, GEM, that estimates and maintains the energy of the asynchronous system below or equal to the energy of sequential SGD with momentum. Experimental results highlight the stability and speedup of GEM compared to existing schemes, even when scaling to one hundred asynchronous workers. Results also indicate better generalization compared to the targeted SGD with momentum.

READ FULL TEXT
research
06/08/2019

Making Asynchronous Stochastic Gradient Descent Work for Transformers

Asynchronous stochastic gradient descent (SGD) is attractive from a spee...
research
02/06/2021

Distributed and Asynchronous Operational Optimization of Networked Microgrids

Smart programmable microgrids (SPM) is an emerging technology for making...
research
09/24/2019

Gap Aware Mitigation of Gradient Staleness

Cloud computing is becoming increasingly popular as a platform for distr...
research
04/04/2016

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training dat...
research
02/21/2019

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Distributed asynchronous offline training has received widespread attent...
research
05/31/2016

Asynchrony begets Momentum, with an Application to Deep Learning

Asynchronous methods are widely used in deep learning, but have limited ...
research
12/20/2014

Deep learning with Elastic Averaging SGD

We study the problem of stochastic optimization for deep learning in the...

Please sign up or login with your details

Forgot password? Click here to reset