The Role of Local Steps in Local SGD

03/14/2022
by   Tiancheng Qin, et al.
0

We consider the distributed stochastic optimization problem where n agents want to minimize a global function given by the sum of agents' local functions, and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. data sets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the i-th communication round, H_i, to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of {H_i}_i=1^R under various settings of strongly convex, convex, and nonconvex local functions, where R is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence {H_i}_i=1^R such that Local SGD can achieve linear speed-up with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps superior to existing communication strategies for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2019

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization

For SGD based distributed stochastic optimization, computation complexit...
research
11/06/2020

Communication-efficient Decentralized Local SGD over Undirected Networks

We consider the distributed learning problem where a network of n agents...
research
06/14/2019

Distributed Optimization for Over-Parameterized Learning

Distributed optimization often consists of two updating phases: local op...
research
10/30/2019

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Communication overhead is one of the key challenges that hinders the sca...
research
09/10/2019

Better Communication Complexity for Local SGD

We revisit the local Stochastic Gradient Descent (local SGD) method and ...
research
06/09/2021

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
10/27/2020

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where...

Please sign up or login with your details

Forgot password? Click here to reset