Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

10/21/2019
by   Rosa Candela, et al.
0

Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. While there exist a large literature on stochastic gradient descent (SGD) and variants, the study of countermeasures to mitigate problems arising in asynchronous distributed settings are still in their infancy. The key question of this work is whether sparsification, a technique predominantly used to reduce communication overheads, can also mitigate the staleness problem that affects asynchronous SGD. We study the role of sparsification both theoretically and empirically. Our theory indicates that, in an asynchronous, non-convex setting, the ergodic convergence rate of sparsified SGD matches the known result O( 1/√(T)) of non-convex SGD. We then carry out an empirical study to complement our theory and show that, in practice, sparsification consistently improves over vanilla SGD and current alternatives to mitigate the effects of staleness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Machine learning has made tremendous progress in recent years, with mode...
research
10/06/2022

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method f...
research
10/17/2019

Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls

Large-scale machine learning training suffers from two prior challenges,...
research
03/03/2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

It has been experimentally observed that the efficiency of distributed t...
research
06/22/2015

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variet...
research
10/03/2020

Practical Precoding via Asynchronous Stochastic Successive Convex Approximation

We consider stochastic optimization of a smooth non-convex loss function...
research
02/22/2022

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

This paper presents fault-tolerant asynchronous Stochastic Gradient Desc...

Please sign up or login with your details

Forgot password? Click here to reset