Sparse Communication for Distributed Gradient Descent

04/17/2017
by   Alham Fikri Aji, et al.
0

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99 value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, whereas different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49 NMT without damaging the final accuracy or BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

In order to extract the best possible performance from asynchronous stoc...
research
09/09/2019

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the sto...
research
09/30/2022

Sparse Random Networks for Communication-Efficient Federated Learning

One main challenge in federated learning is the large communication cost...
research
11/05/2015

Symmetry-invariant optimization in deep networks

Recent works have highlighted scale invariance or symmetry that is prese...
research
12/04/2020

A Variant of Gradient Descent Algorithm Based on Gradient Averaging

In this work, we study an optimizer, Grad-Avg to optimize error function...
research
02/05/2019

Exponentiated Gradient Meets Gradient Descent

The (stochastic) gradient descent and the multiplicative update method a...
research
07/25/2022

On the benefits of non-linear weight updates

Recent work has suggested that the generalisation performance of a DNN i...

Please sign up or login with your details

Forgot password? Click here to reset