Acceleration via Fractal Learning Rate Schedules

03/01/2021
by   Naman Agarwal, et al.
4

When balancing the practical tradeoffs of iterative methods for large-scale optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent, and show that the problem of mitigating instability leads to a fractal ordering of step sizes. We provide some experiments and discussion to challenge current understandings of the "edge of stability" in deep learning: even in simple settings, provable acceleration can be obtained by making negative local progress on the objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
10/09/2019

On the adequacy of untuned warmup for adaptive optimization

Adaptive optimization algorithms such as Adam (Kingma Ba, 2014) are ...
research
07/09/2021

REX: Revisiting Budgeted Training with an Improved Schedule

Deep learning practitioners often operate on a computational and monetar...
research
02/26/2020

Disentangling Adaptive Gradient Methods from Learning Rates

We investigate several confounding factors in the evaluation of optimiza...
research
02/24/2020

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep lea...
research
06/25/2020

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

Many machine learning models require a training procedure based on runni...
research
09/14/2023

Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

Can we accelerate convergence of gradient descent without changing the a...

Please sign up or login with your details

Forgot password? Click here to reset