Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

11/01/2021
by   Yujia Wang, et al.
0

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Distributed adaptive stochastic gradient methods have been widely used f...
research
03/12/2019

Communication-efficient distributed SGD with Sketching

Large-scale distributed training of neural networks is often limited by ...
research
08/13/2020

Training Faster with Compressed Gradient

Although the distributed machine learning methods show the potential for...
research
10/16/2019

A Double Residual Compression Algorithm for Efficient Distributed Learning

Large-scale machine learning models are often trained by parallel stocha...
research
12/08/2021

FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning

With the rapid increase of big data, distributed Machine Learning (ML) h...
research
05/30/2023

Clip21: Error Feedback for Gradient Clipping

Motivated by the increasing popularity and importance of large-scale tra...
research
12/06/2019

Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors

Network-distributed optimization has attracted significant attention in ...

Please sign up or login with your details

Forgot password? Click here to reset