Revisiting Distributed Synchronous SGD

04/04/2016
by   Jianmin Chen, et al.
0

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches. We demonstrate that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers. Our approach is empirically validated and shown to converge faster and to better test accuracies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for trai...
research
06/18/2023

DropCompute: simple and more robust distributed synchronous training via compute variance reduction

Background: Distributed training is essential for large scale training o...
research
05/22/2018

Gradient Energy Matching for Distributed Asynchronous Gradient Descent

Distributed asynchronous SGD has become widely used for deep learning in...
research
11/27/2021

DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning

We consider straggler-resilient learning. In many previous works, e.g., ...
research
12/20/2014

Deep learning with Elastic Averaging SGD

We study the problem of stochastic optimization for deep learning in the...
research
11/08/2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Distributed training of deep nets is an important technique to address s...
research
06/14/2023

A^2CiD^2: Accelerating Asynchronous Communication in Decentralized Deep Learning

Distributed training of Deep Learning models has been critical to many r...

Please sign up or login with your details

Forgot password? Click here to reset