Asynchronous Decentralized Parallel Stochastic Gradient Descent

10/18/2017
by   Xiangru Lian, et al.
0

Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centralized parallel protocols such as Tensorflow, it still remains unclear how to apply the asynchronous parallelism to improve the efficiency of decentralized parallel algorithms. This paper proposes an asynchronous decentralize parallel stochastic gradient descent algorithm to apply the asynchronous parallelism technology to decentralized algorithms. Our theoretical analysis provides the convergence rate or equivalently the computational complexity, which is consistent with many special cases and indicates we can achieve nice linear speedup when we increase the number of nodes or the batchsize. Extensive experiments in deep learning validate the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Most distributed machine learning systems nowadays, including TensorFlow...
research
12/16/2018

Stochastic Distributed Optimization for Machine Learning from Decentralized Features

Distributed machine learning has been widely studied in the literature t...
research
05/14/2020

MixML: A Unified Analysis of Weakly Consistent Parallel Learning

Parallelism is a ubiquitous method for accelerating machine learning alg...
research
11/18/2022

TensAIR: Online Learning from Data Streams via Asynchronous Iterative Routing

Online learning (OL) from data streams is an emerging area of research t...
research
04/12/2018

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most impo...
research
04/04/2018

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

We address the issue of speeding up the training of convolutional neural...
research
02/18/2019

A parallel Fortran framework for neural networks and deep learning

This paper describes neural-fortran, a parallel Fortran framework for ne...

Please sign up or login with your details

Forgot password? Click here to reset