Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

06/15/2022
by   Konstantin Mishchenko, et al.
0

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on "virtual iterates" and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2018

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning

Understanding the convergence performance of asynchronous stochastic gra...
research
08/18/2023

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Stochastic gradient descent (SGD) performed in an asynchronous manner pl...
research
04/05/2020

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distribute...
research
11/17/2022

Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent

Large-scale non-convex optimization problems are expensive to solve due ...
research
03/03/2018

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...
research
03/03/2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

It has been experimentally observed that the efficiency of distributed t...
research
03/23/2020

Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...

Please sign up or login with your details

Forgot password? Click here to reset