On the Convergence of Step Decay Step-Size for Stochastic Optimization

02/18/2021
βˆ™
by   Xiaoyu Wang, et al.
βˆ™
0
βˆ™

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide the convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an π’ͺ(ln T/√(T)) rate. We also provide the convergence guarantees for general (possibly non-smooth) convex problems, ensuring an π’ͺ(ln T/√(T)) convergence rate. Finally, in the strongly convex case, we establish an π’ͺ(ln T/T) rate for smooth problems, which we also prove to be tight, and an π’ͺ(ln^2 T /T) rate without the smoothness assumption. We illustrate the practical efficiency of the step decay step-size in several large scale deep neural network training tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 06/05/2021

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Many popular learning-rate schedules for deep neural networks combine a ...
research
βˆ™ 09/15/2023

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

In neural network training, RMSProp and ADAM remain widely favoured opti...
research
βˆ™ 11/18/2019

Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

Although ADAM is a very popular algorithm for optimizing the weights of ...
research
βˆ™ 04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
βˆ™ 08/20/2018

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

Although stochastic gradient descent () method and its variants (e.g., s...
research
βˆ™ 04/01/2022

Learning to Accelerate by the Methods of Step-size Planning

Gradient descent is slow to converge for ill-conditioned problems and no...
research
βˆ™ 07/16/2017

Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training

In this paper, we propose a generic and simple algorithmic framework for...

Please sign up or login with your details

Forgot password? Click here to reset