Distributed Sparse SGD with Majority Voting

11/12/2020
by   Kerem Ozfatura, et al.
18

Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This strategy provides a significant reduction in the communication load and allows using the same sparsity level in both communication directions. Through extensive simulations on the CIFAR-10 dataset, we show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

Wall-clock convergence time and communication load are key performance m...
research
10/31/2022

Communication-Efficient Local SGD with Age-Based Worker Selection

A major bottleneck of distributed learning under parameter-server (PS) f...
research
02/15/2023

Sparse-SignSGD with Majority Vote for Communication-Efficient Distributed Learning

The training efficiency of complex deep learning models can be significa...
research
04/17/2023

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

In distributed machine learning, a central node outsources computational...
research
03/12/2019

Communication-efficient distributed SGD with Sketching

Large-scale distributed training of neural networks is often limited by ...
research
09/04/2022

Communication Efficient Distributed Learning over Wireless Channels

Vertical distributed learning exploits the local features collected by m...
research
07/07/2023

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Gradient sparsification is a widely adopted solution for reducing the ex...

Please sign up or login with your details

Forgot password? Click here to reset