ErrorCompensatedX: error compensation for variance reduced algorithms

08/04/2021
by   Hanlin Tang, et al.
0

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by adding back the compression error from the previous step. This idea was further extended to one class of variance reduced algorithms, where the variance of the stochastic gradient is reduced by taking a moving average over all history gradients. However, our analysis shows that just adding the previous step's compression error, as done in existing work, does not fully compensate the compression error. So, we propose ErrorCompensatedX, which uses the compression error from the previous two steps. We show that ErrorCompensatedX can achieve the same asymptotic convergence rate with the training without compression. Moreover, we provide a unified theoretical analysis framework for this class of variance reduced algorithms, with or without error compensation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have...
research
05/15/2019

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stoch...
research
07/17/2019

DeepSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
03/21/2019

SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent

We analyze SVAG, a variance reduced stochastic gradient method with SAG ...
research
06/21/2022

Shifted Compression Framework: Generalizations and Improvements

Communication is one of the key bottlenecks in the distributed training ...
research
06/06/2021

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

We propose a generic variance-reduced algorithm, which we call MUltiple ...
research
05/28/2022

ByteComp: Revisiting Gradient Compression in Distributed Training

Gradient compression (GC) is a promising approach to addressing the comm...

Please sign up or login with your details

Forgot password? Click here to reset