Better Communication Complexity for Local SGD

09/10/2019
by   Ahmed Khaled, et al.
0

We revisit the local Stochastic Gradient Descent (local SGD) method and prove new convergence rates. We close the gap in the theory by showing that it works under unbounded gradients and extend its convergence to weakly convex functions. Furthermore, by changing the assumptions, we manage to get new bounds that explain in what regimes local SGD is faster that its non-local version. For instance, if the objective is strongly convex, we show that, up to constants, it is sufficient to synchronize M times in total, where M is the number of nodes. This improves upon the known requirement of Stich (2018) of √(TM) synchronization times in total, where T is the total number of iterations, which helps to explain the empirical success of local SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Distributed parallel stochastic gradient descent algorithms are workhors...
research
06/26/2018

Random Shuffling Beats SGD after Finite Epochs

A long-standing problem in the theory of stochastic gradient descent (SG...
research
03/14/2022

The Role of Local Steps in Local SGD

We consider the distributed stochastic optimization problem where n agen...
research
01/30/2022

Faster Convergence of Local SGD for Over-Parameterized Models

Modern machine learning architectures are often highly expressive. They ...
research
04/19/2013

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

We consider stochastic strongly convex optimization with a complex inequ...
research
10/30/2019

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Communication overhead is one of the key challenges that hinders the sca...
research
01/27/2019

99

It is well known that many optimization methods, including SGD, SAGA, an...

Please sign up or login with your details

Forgot password? Click here to reset