Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

06/04/2018
by   Tianyi Liu, et al.
0

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem - streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

READ FULL TEXT
research
02/14/2018

Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely ap...
research
03/06/2018

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

Stochastic optimization naturally arises in machine learning. Efficient ...
research
08/03/2022

SGEM: stochastic gradient with energy and momentum

In this paper, we propose SGEM, Stochastic Gradient with Energy and Mome...
research
10/01/2021

Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

Momentum method has been used extensively in optimizers for deep learnin...
research
05/17/2018

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

In this paper we explore acceleration techniques for large scale nonconv...
research
05/18/2020

Scaling-up Distributed Processing of Data Streams for Machine Learning

Emerging applications of machine learning in numerous areas involve cont...
research
08/04/2023

A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization

While highly expressive parametric models including deep neural networks...

Please sign up or login with your details

Forgot password? Click here to reset