LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

05/25/2018
by   Tianyi Chen, et al.
0

This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily Aggregated Gradient --- justifying our acronym LAG used henceforth. Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex smooth cases; and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients. Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.

READ FULL TEXT
research
09/17/2019

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

The present paper develops a novel aggregated gradient approach for dist...
research
12/31/2020

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary wor...
research
02/26/2020

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

This paper targets solving distributed machine learning problems such as...
research
09/24/2022

Communication-Efficient Federated Learning Using Censored Heavy Ball Descent

Distributed machine learning enables scalability and computational offlo...
research
12/07/2018

Communication-Efficient Distributed Reinforcement Learning

This paper studies the distributed reinforcement learning (DRL) problem ...
research
01/09/2019

The Lingering of Gradients: How to Reuse Gradients over Time

Classically, the time complexity of a first-order method is estimated by...
research
10/31/2022

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optim...

Please sign up or login with your details

Forgot password? Click here to reset