A Unified Analysis of Stochastic Momentum Methods for Deep Learning

08/30/2018
by   Yan Yan, et al.
0

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2016

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, stochastic momentum methods have been widely adopted in train...
research
08/10/2018

On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks

Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, ...
research
09/12/2018

On the Stability and Convergence of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with the stochastic gradien...
research
05/10/2021

A Bregman Learning Framework for Sparse Neural Networks

We propose a learning framework based on stochastic Bregman iterations t...
research
10/05/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks part...
research
06/01/2020

Factorial Powers for Stochastic Optimization

The convergence rates for convex and non-convex optimization methods dep...

Please sign up or login with your details

Forgot password? Click here to reset