The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

04/29/2019
by   Rong Ge, et al.
12

There is a stark disparity between the step size schedules used in practical large scale machine learning and those that are considered optimal by the theory of stochastic approximation. In theory, most results utilize polynomially decaying learning rate schedules, while, in practice, the "Step Decay" schedule is among the most popular schedules, where the learning rate is cut every constant number of epochs (i.e. this is a geometrically decaying schedule). This work examines the step-decay schedule for the stochastic optimization problem of streaming least squares regression (both in the non-strongly convex and strongly convex case), where we show that a sharp theoretical characterization of an optimal learning rate schedule is far more nuanced than suggested by previous work. We focus specifically on the rate that is achievable when using the final iterate of stochastic gradient descent, as is commonly done in practice. Our main result provably shows that a properly tuned geometrically decaying learning rate schedule provides an exponential improvement (in terms of the condition number) over any polynomially decaying learning rate schedule. We also provide experimental support for wider applicability of these results, including for training modern deep neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
07/22/2019

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Stochastic (sub)gradient methods require step size schedule tuning to pe...
research
03/01/2021

Acceleration via Fractal Learning Rate Schedules

When balancing the practical tradeoffs of iterative methods for large-sc...
research
05/12/2019

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

In most practical settings and theoretical analysis, one assumes that a ...
research
07/14/2019

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

We study two time-scale linear stochastic approximation algorithms, whic...
research
03/27/2023

Learning Rate Schedules in the Presence of Distribution Shift

We design learning rate schedules that minimize regret for SGD-based onl...
research
02/24/2020

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep lea...

Please sign up or login with your details

Forgot password? Click here to reset