Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

08/22/2018
by   Jianyu Wang, et al.
0

State-of-the-art distributed machine learning suffers from significant delays due to frequent communication and synchronizing between worker nodes. Emerging communication-efficient SGD algorithms that limit synchronization between locally trained models have been shown to be effective in speeding-up distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a new framework called Coooperative SGD that subsumes existing communication-efficient SGD algorithms such as federated-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence.

READ FULL TEXT
research
04/25/2019

Communication trade-offs for synchronized distributed SGD with large step size

Synchronous mini-batch SGD is state-of-the-art for large-scale distribut...
research
12/06/2018

Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols

Distributing Neural Network training is of particular interest for sever...
research
10/30/2019

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Communication overhead is one of the key challenges that hinders the sca...
research
06/12/2020

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Large neural network models present a hefty communication challenge to d...
research
04/30/2020

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Deep learning at scale is dominated by communication time. Distributing ...
research
03/09/2023

Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training

Geo-distributed ML training can benefit many emerging ML scenarios (e.g....
research
10/26/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Decentralized SGD is an emerging training method for deep learning known...

Please sign up or login with your details

Forgot password? Click here to reset