Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

03/24/2022
by   Tomer Avidor, et al.
0

Distributed training algorithms of deep neural networks show impressive convergence speedup properties on very large problems. However, they inherently suffer from communication related slowdowns and communication topology becomes a crucial design choice. Common approaches supported by most machine learning frameworks are: 1) Synchronous decentralized algorithms relying on a peer-to-peer All Reduce topology that is sensitive to stragglers and communication delays. 2) Asynchronous centralised algorithms with a server based topology that is prone to communication bottleneck. Researchers also suggested asynchronous decentralized algorithms designed to avoid the bottleneck and speedup training, however, those commonly use inexact sparse averaging that may lead to a degradation in accuracy. In this paper, we propose Local Asynchronous SGD (LASGD), an asynchronous decentralized algorithm that relies on All Reduce for model synchronization. We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset. Our experiments demonstrate that LASGD accelerates training compared to SGD and state of the art gossip based approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for trai...
research
05/23/2019

MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

The trade-off between convergence error and communication delays in dece...
research
04/10/2020

Asynchronous Decentralized Learning of a Neural Network

In this work, we exploit an asynchronous computing framework namely ARoc...
research
03/14/2021

CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner

Distributed deep learning workloads include throughput-intensive trainin...
research
07/14/2023

DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates

Two widely considered decentralized learning algorithms are Gossip and r...
research
12/20/2014

Deep learning with Elastic Averaging SGD

We study the problem of stochastic optimization for deep learning in the...
research
11/08/2021

BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning

Decentralized algorithm is a form of computation that achieves a global ...

Please sign up or login with your details

Forgot password? Click here to reset