Local SGD Converges Fast and Communicates Little

05/24/2018
by   Sebastian U. Stich, et al.
0

Mini-batch stochastic gradient descent (SGD) is the state of the art in large scale parallel machine learning, but its scalability is limited by a communication bottleneck. Recent work proposed local SGD, i.e. running SGD independently in parallel on different workers and averaging only once in a while. This scheme shows promising results in practice, but eluded thorough theoretical analysis. We prove concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speed-up in the number of workers and mini-batch size. Moreover, the number of communication rounds can be reduced up to a factor of T^1/2---where T denotes the number of total steps---compared to mini-batch SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication

For large scale non-convex stochastic optimization, parallel mini-batch ...
research
04/25/2019

Communication trade-offs for synchronized distributed SGD with large step size

Synchronous mini-batch SGD is state-of-the-art for large-scale distribut...
research
12/30/2020

Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability

Distributed deep learning is an effective way to reduce the training tim...
research
05/31/2020

DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging

The state-of-the-art deep learning algorithms rely on distributed traini...
research
08/22/2018

Don't Use Large Mini-Batches, Use Local SGD

Mini-batch stochastic gradient methods are the current state of the art ...
research
01/19/2019

Fitting ReLUs via SGD and Quantized SGD

In this paper we focus on the problem of finding the optimal weights of ...
research
05/18/2015

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

Mini-batch optimization has proven to be a powerful paradigm for large-s...

Please sign up or login with your details

Forgot password? Click here to reset