Decentralization Meets Quantization

03/17/2018
by   Hanlin Tang, et al.
0

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: quantization for low bandwidth networks, and decentralization for high latency networks. In this paper, we explore a natural question: can the combination of both decentralization and quantization lead to a system that is robust to both bandwidth and latency? Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: simply quantizing data sent in a decentralized training algorithm would accumulate the error. In this paper, we develop a framework of quantized, decentralized training and propose two different strategies, which we call extrapolation compression and difference compression. We analyze both algorithms and prove both converge at the rate of O(1/√(nT)) where n is the number of workers and T is the number of iterations, matching the convergence rate for full precision, centralized training. We evaluate our algorithms on training deep learning models, and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with both high latency and low bandwidth.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2019

Decentralized Deep Learning with Arbitrary Communication Compression

Decentralized training of deep learning models is a key element for enab...
research
07/17/2019

DeepSqueeze: Decentralization Meets Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
02/22/2020

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed learning techniques such as federated learning have enabled ...
research
07/17/2019

DeepSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
02/26/2020

Moniqua: Modulo Quantized Communication in Decentralized SGD

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has...
research
10/28/2019

Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Generative Adversarial Networks (GANs) are powerful class of generative ...
research
08/04/2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...

Please sign up or login with your details

Forgot password? Click here to reset