DeepAI AI Chat
Log In Sign Up

Accumulated Gradient Normalization

10/06/2017
by   Joeri Hermans, et al.
Université de Liège
Maastricht University
0

This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/24/2019

Gradient Sparification for Asynchronous Distributed Training

Modern large scale machine learning applications require stochastic opti...
06/16/2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for dist...
10/11/2021

Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer w...
12/29/2018

SPI-Optimizer: an integral-Separated PI Controller for Stochastic Optimization

To overcome the oscillation problem in the classical momentum-based opti...
03/05/2020

On the Convergence of Adam and Adagrad

We provide a simple proof of the convergence of the optimization algorit...
12/15/2020

Anytime Minibatch with Delayed Gradients

Distributed optimization is widely deployed in practice to solve a broad...
05/13/2020

The effect of Target Normalization and Momentum on Dying ReLU

Optimizing parameters with momentum, normalizing data values, and using ...