CADA: Communication-Adaptive Distributed Adam

12/31/2020
by   Tianyi Chen, et al.
15

Stochastic gradient descent (SGD) has taken the stage as the primary workhorse for large-scale machine learning. It is often used with its adaptive variants such as AdaGrad, Adam, and AMSGrad. This paper proposes an adaptive stochastic gradient descent method for distributed machine learning, which can be viewed as the communication-adaptive counterpart of the celebrated Adam method - justifying its name CADA. The key components of CADA are a set of new rules tailored for adaptive stochastic gradients that can be implemented to save communication upload. The new algorithms adaptively reuse the stale Adam gradients, thus saving communication, and still have convergence rates comparable to original Adam. In numerical experiments, CADA achieves impressive empirical performance in terms of total communication round reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2020

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

This paper targets solving distributed machine learning problems such as...
research
11/20/2019

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Recent years have witnessed the growth of large-scale distributed machin...
research
09/09/2019

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the sto...
research
05/25/2018

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

This paper presents a new class of gradient methods for distributed mach...
research
02/20/2020

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets

We study distributed optimization algorithms for minimizing the average ...
research
04/21/2016

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in l...
research
10/16/2020

Improved Communication Lower Bounds for Distributed Optimisation

Motivated by the interest in communication-efficient methods for distrib...

Please sign up or login with your details

Forgot password? Click here to reset