A Double Residual Compression Algorithm for Efficient Distributed Learning

10/16/2019
by   Xiaorui Liu, et al.
0

Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over 95% of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stoch...
research
11/01/2021

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed l...
research
04/14/2022

Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression

Traditional one-bit compressed stochastic gradient descent can not be di...
research
08/23/2021

Rate distortion comparison of a few gradient quantizers

This article is in the context of gradient compression. Gradient compres...
research
03/28/2019

Block stochastic gradient descent for large-scale tomographic reconstruction in a parallel network

Iterative algorithms have many advantages for linear tomographic image r...
research
02/06/2019

CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning

We focus on the commonly used synchronous Gradient Descent paradigm for ...
research
12/05/2021

A Novel Sequential Coreset Method for Gradient Descent Algorithms

A wide range of optimization problems arising in machine learning can be...

Please sign up or login with your details

Forgot password? Click here to reset