Guided parallelized stochastic gradient descent for delay compensation

01/17/2021
by   Anuraganand Sharma, et al.
10

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves the accuracy close to sequential SGD for some benchmark datasets.

READ FULL TEXT

page 13

page 15

page 17

research
08/24/2015

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent (SGD) and its variants have become more and ...
research
05/24/2018

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning

Understanding the convergence performance of asynchronous stochastic gra...
research
05/14/2020

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

The training of modern deep learning neural network calls for large amou...
research
11/06/2019

DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training

Data parallelism has become the de facto standard for training Deep Neur...
research
12/27/2021

AET-SGD: Asynchronous Event-triggered Stochastic Gradient Descent

Communication cost is the main bottleneck for the design of effective di...
research
02/17/2021

Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence

Stochastic gradient descent (SGD) is an essential element in Machine Lea...
research
06/12/2022

Stochastic Gradient Descent without Full Data Shuffle

Stochastic gradient descent (SGD) is the cornerstone of modern machine l...

Please sign up or login with your details

Forgot password? Click here to reset