Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

05/28/2022
by   Congliang Chen, et al.
0

Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding ε-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed Efficient-Adam. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators, and further characterize its communication complexity between the server and workers when an ε-stationary point is achieved. Finally, we apply Efficient-Adam to solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experiments together with a theoretical guarantee justify the merits of Efficient Adam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2021

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed l...
research
04/29/2020

Quantized Adam with Error Feedback

In this paper, we present a distributed variant of adaptive stochastic g...
research
02/15/2023

Sparse-SignSGD with Majority Vote for Communication-Efficient Distributed Learning

The training efficiency of complex deep learning models can be significa...
research
04/03/2023

SparDL: Distributed Deep Learning Training with Efficient Sparse Communication

Top-k sparsification has recently been widely used to reduce the communi...
research
10/07/2021

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

We study the MARINA method of Gorbunov et al (2021) – the current state-...
research
05/24/2019

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

We consider distributed optimization under communication constraints for...
research
06/14/2019

Distributed Optimization for Over-Parameterized Learning

Distributed optimization often consists of two updating phases: local op...

Please sign up or login with your details

Forgot password? Click here to reset