Quantization for Distributed Optimization

09/26/2021
by   Vineeth S, et al.
0

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distributed systems does not scale linearly with the number of workers due to the high network communication cost for synchronizing gradients and parameters. Researchers have proposed techniques such as quantization and sparsification to alleviate this problem by compressing the gradients. Most of the compression schemes result in compressed gradients that cannot be directly aggregated with efficient protocols such as all-reduce. In this paper, we present a set of all-reduce compatible gradient compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla SGD. We present the results of our experiments with the CIFAR10 dataset and observations derived during the process. Our compression methods perform better than the in-built methods currently offered by the deep learning frameworks. Code is available at the repository: <https://github.com/vineeths96/Gradient-Compression>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bot...
research
11/08/2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Data parallelism can boost the training speed of convolutional neural ne...
research
10/23/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Many communication-efficient variants of SGD use gradient quantization s...
research
03/05/2021

Pufferfish: Communication-efficient Models At No Extra Cost

To mitigate communication overheads in distributed model training, sever...
research
05/22/2017

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

High network communication cost for synchronizing gradients and paramete...
research
02/28/2021

On the Utility of Gradient Compression in Distributed Training Systems

Rapid growth in data sets and the scale of neural network architectures ...
research
05/15/2020

A flexible, extensible software framework for model compression based on the LC algorithm

We propose a software framework based on the ideas of the Learning-Compr...

Please sign up or login with your details

Forgot password? Click here to reset