Accumulated Gradient Normalization

10/06/2017
by   Joeri Hermans, et al.
0

This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

Gradient Sparification for Asynchronous Distributed Training

Modern large scale machine learning applications require stochastic opti...
research
06/16/2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for dist...
research
10/11/2021

Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer w...
research
12/29/2018

SPI-Optimizer: an integral-Separated PI Controller for Stochastic Optimization

To overcome the oscillation problem in the classical momentum-based opti...
research
03/05/2020

On the Convergence of Adam and Adagrad

We provide a simple proof of the convergence of the optimization algorit...
research
12/15/2020

Anytime Minibatch with Delayed Gradients

Distributed optimization is widely deployed in practice to solve a broad...
research
05/13/2020

The effect of Target Normalization and Momentum on Dying ReLU

Optimizing parameters with momentum, normalizing data values, and using ...

Please sign up or login with your details

Forgot password? Click here to reset