A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

01/24/2020
by   Yuhua Zhu, et al.
0

We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. Stochastic modified equations for the asynchronous stochastic gradient descent, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2017

Conditional Accelerated Lazy Stochastic Gradient Descent

In this work we introduce a conditional accelerated lazy stochastic grad...
research
05/21/2018

Stochastic modified equations for the asynchronous stochastic gradient descent

We propose a stochastic modified equations (SME) for modeling the asynch...
research
06/26/2018

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

We provide tight finite-time convergence bounds for gradient descent and...
research
03/19/2018

D^2: Decentralized Training over Decentralized Data

While training a machine learning model using multiple workers, each of ...
research
11/03/2017

Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs

We present convergence rate analysis for the approximate stochastic grad...
research
09/26/2019

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Background: Recent developments have made it possible to accelerate neur...
research
11/12/2021

Solving A System Of Linear Equations By Randomized Orthogonal Projections

Randomization has shown catalyzing effects in linear algebra with promis...

Please sign up or login with your details

Forgot password? Click here to reset