Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

11/20/2019
by   Cong Xie, et al.
0

Recent years have witnessed the growth of large-scale distributed machine learning algorithms – specifically designed to accelerate model training by distributing computation across multiple machines. When scaling distributed training in this way, the communication overhead is often the bottleneck. In this paper, we study the local distributed Stochastic Gradient Descent (SGD) algorithm, which reduces the communication overhead by decreasing the frequency of synchronization. While SGD with adaptive learning rates is a widely adopted strategy for training neural networks, it remains unknown how to implement adaptive learning rates in local SGD. To this end, we propose a novel SGD variant with reduced communication and adaptive learning rates, with provable convergence. Empirical results show that the proposed algorithm has fast convergence and efficiently reduces the communication overhead.

READ FULL TEXT
research
12/31/2020

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary wor...
research
07/26/2020

CSER: Communication-efficient SGD with Error Reset

The scalability of Distributed Stochastic Gradient Descent (SGD) is toda...
research
03/12/2021

EventGraD: Event-Triggered Communication in Parallel Machine Learning

Communication in parallel systems imposes significant overhead which oft...
research
05/03/2019

Performance Optimization on Model Synchronization in Parallel Stochastic Gradient Descent Based SVM

Understanding the bottlenecks in implementing stochastic gradient descen...
research
06/11/2019

Optimizing Pipelined Computation and Communication for Latency-Constrained Edge Learning

Consider a device that is connected to an edge processor via a communica...
research
02/21/2020

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Distributed stochastic gradient descent (SGD) is essential for scaling t...
research
12/16/2018

Stochastic Distributed Optimization for Machine Learning from Decentralized Features

Distributed machine learning has been widely studied in the literature t...

Please sign up or login with your details

Forgot password? Click here to reset