Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers

10/06/2018
by   Nuwan Ferdinand, et al.
0

In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings. One cause is slow workers, termed stragglers, who can cause the fusion step at the master node to stall, which greatly slowing convergence. In many straggler mitigation approaches work completed by these nodes, while only partial, is discarded completely. In this paper, we propose an approach to parallelizing synchronous SGD that exploits the work completed by all workers. The central idea is to fix the computation time of each worker and then to combine distinct contributions of all workers. We provide a convergence analysis and optimize the combination function. Our numerical results demonstrate an improvement of several factors of magnitude in comparison to existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2018

Zeno: Byzantine-suspicious stochastic gradient descent

We propose Zeno, a new robust aggregation rule, for distributed synchron...
research
01/15/2019

Distributed Stochastic Gradient Descent Using LDGM Codes

We consider a distributed learning problem in which the computation is c...
research
08/04/2022

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

We consider the setting where a master wants to run a distributed stocha...
research
02/25/2020

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

We consider the setting where a master wants to run a distributed stocha...
research
05/21/2016

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale ma...
research
10/06/2022

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Synchronous local stochastic gradient descent (local SGD) suffers from s...
research
06/03/2020

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizi...

Please sign up or login with your details

Forgot password? Click here to reset