Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

06/13/2019
by   Kwangmin Yu, et al.
0

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants, mainly asynchronous and synchronous SGD, are widely used. Asynchronous SGD is communication efficient but suffers from accuracy degradation due to delayed parameter updating. Synchronous SGD becomes communication intensive when the number of nodes increases regardless of its advantage. To address these issues, we introduce Layered SGD (LSGD), a new decentralized synchronous SGD algorithm. LSGD partitions computing resources into subgroups that each contain a communication layer (communicator) and a computation layer (worker). Each subgroup has centralized communication for parameter updates while communication between subgroups is handled by communicators. As a result, communication time is overlapped with I/O latency of workers. The efficiency of the algorithm is tested by training a deep network on the ImageNet classification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2019

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training

Data parallelism has become the de facto standard for training Deep Neur...
research
03/24/2022

Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

Distributed training algorithms of deep neural networks show impressive ...
research
04/04/2016

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training dat...
research
11/08/2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Distributed training of deep nets is an important technique to address s...
research
06/08/2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deep learning thrives with large neural networks and large datasets. How...
research
12/20/2014

Deep learning with Elastic Averaging SGD

We study the problem of stochastic optimization for deep learning in the...
research
07/01/2020

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

As a crucial scheme to accelerate the deep neural network (DNN) training...

Please sign up or login with your details

Forgot password? Click here to reset