The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

02/15/2021
by   Wei Tao, et al.
0

The adaptive stochastic gradient descent (SGD) with momentum has been widely adopted in deep learning as well as convex optimization. In practice, the last iterate is commonly used as the final solution to make decisions. However, the available regret analysis and the setting of constant momentum parameters only guarantee the optimal convergence of the averaged solution. In this paper, we fill this theory-practice gap by investigating the convergence of the last iterate (referred to as individual convergence), which is a more difficult task than convergence analysis of the averaged solution. Specifically, in the constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB) method, in which only the step size is updated using the exponential moving average strategy, attains an optimal individual convergence rate of O(1/√(t)), as opposed to the optimality of O(log t/√(t)) of SGD, where t is the number of iterations. Our new analysis not only shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization but also gives valuable hints how the momentum parameters should be scheduled in deep learning. Empirical results on optimizing convex functions and training deep networks validate the correctness of our convergence analysis and demonstrate the improved performance of the adaptive HB methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2016

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, stochastic momentum methods have been widely adopted in train...
research
08/13/2022

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Adaptive gradient algorithms borrow the moving average idea of heavy bal...
research
08/13/2019

On the Convergence of AdaBound and its Connection to SGD

Adaptive gradient methods such as Adam have gained extreme popularity du...
research
06/14/2020

On the convergence of the Stochastic Heavy Ball Method

We provide a comprehensive analysis of the Stochastic Heavy Ball (SHB) m...
research
02/12/2022

From Online Optimization to PID Controllers: Mirror Descent with Momentum

We study a family of first-order methods with momentum based on mirror d...
research
10/05/2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks part...
research
03/01/2023

A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data

Emerging distributed applications recently boosted the development of de...

Please sign up or login with your details

Forgot password? Click here to reset