Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

05/22/2018
by   Felix Sattler, et al.
0

Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using × 3531 less bits or train it to a 1% lower accuracy using × 37208 less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2018

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Ne...
research
06/21/2021

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gr...
research
02/18/2021

Efficient Distributed Auto-Differentiation

Although distributed machine learning has opened up numerous frontiers o...
research
03/16/2021

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensio...
research
02/05/2021

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

Sparse tensors appear frequently in distributed deep learning, either as...
research
01/29/2022

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity

More than 70 of these idle compute are cheap CPUs with few cores that ar...
research
11/23/2019

Compressing Representations for Embedded Deep Learning

Despite recent advances in architectures for mobile devices, deep learni...

Please sign up or login with your details

Forgot password? Click here to reset