SparDL: Distributed Deep Learning Training with Efficient Sparse Communication

04/03/2023
by   Minjun Zhao, et al.
0

Top-k sparsification has recently been widely used to reduce the communication volume in distributed deep learning; however, due to Gradient Accumulation (GA) dilemma, the performance of top-k sparsification is still limited. Several methods have been proposed to handle the GA dilemma but have two drawbacks: (1) they are frustrated by the high communication complexity as they introduce a large amount of extra transmission; (2) they are not flexible for non-power-of-two numbers of workers. To solve these two problems, we propose a flexible and efficient sparse communication framework, dubbed SparDL. SparDL uses the Spar-Reduce-Scatter algorithm to solve the GA dilemma without additional communication operations and is flexible to any number of workers. Besides, to further reduce the communication complexity and adjust the proportion of latency and bandwidth cost in communication complexity, we propose the Spar-All-Gather algorithm as part of SparDL. Extensive experiments validate the superiority of SparDL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Distributed adaptive stochastic gradient methods have been widely used f...
research
10/05/2021

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Distributed stochastic gradient descent (SGD) approach has been widely u...
research
02/16/2018

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art dee...
research
01/14/2019

A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks

Distributed synchronous stochastic gradient descent (S-SGD) with data pa...
research
02/15/2023

Sparse-SignSGD with Majority Vote for Communication-Efficient Distributed Learning

The training efficiency of complex deep learning models can be significa...
research
09/24/2019

Gap Aware Mitigation of Gradient Staleness

Cloud computing is becoming increasingly popular as a platform for distr...
research
02/24/2023

Decoupling the All-Reduce Primitive for Accelerating Distributed Deep Learning

Communication scheduling has been shown to be effective in accelerating ...

Please sign up or login with your details

Forgot password? Click here to reset