Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

08/27/2018
by   Nikolay Bogoychev, et al.
0

In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly. In order to achieve further speedup we introduce a technique that delays gradient updates effectively increasing the mini-batch size. Unfortunately with the increase of mini-batch size we worsen the stale gradient problem in asynchronous stochastic gradient descent (SGD) which makes the model convergence poor. We introduce local optimizers which mitigate the stale gradient problem and together with fine tuning our momentum we are able to train a shallow machine translation system 27 baseline with negligible penalty in BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

MBGDT:Robust Mini-Batch Gradient Descent

In high dimensions, most machine learning method perform fragile even th...
research
04/17/2017

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging spa...
research
05/25/2018

Gradient Coding via the Stochastic Block Model

Gradient descent and its many variants, including mini-batch stochastic ...
research
11/15/2019

Optimal Mini-Batch Size Selection for Fast Gradient Descent

This paper presents a methodology for selecting the mini-batch size that...
research
06/28/2015

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Stochastic gradient descent (SGD) holds as a classical method to build l...
research
09/29/2022

Convergence of the mini-batch SIHT algorithm

The Iterative Hard Thresholding (IHT) algorithm has been considered exte...
research
09/14/2015

Model Accuracy and Runtime Tradeoff in Distributed Deep Learning:A Systematic Study

This paper presents Rudra, a parameter server based distributed computin...

Please sign up or login with your details

Forgot password? Click here to reset