Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm

by   Michele Donini, et al.

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between the recently proposed techniques of Franceschi et al. (2017) and Baydin et al. (2017), featuring increased stability and faster convergence. We show empirically that the proposed method compares favourably with baselines and related methods in terms of final test accuracy.


page 1

page 2

page 3

page 4


Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

First-order optimization methods have been playing a prominent role in d...

YellowFin and the Art of Momentum Tuning

Hyperparameter tuning is one of the big costs of deep learning. State-of...

Comment on Stochastic Polyak Step-Size: Performance of ALI-G

This is a short note on the performance of the ALI-G algorithm (Berrada ...

Network Pruning That Matters: A Case Study on Retraining Variants

Network pruning is an effective method to reduce the computational expen...

Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies

Natural Evolution Strategies (NES) is a promising framework for black-bo...

An optimal scheduled learning rate for a randomized Kaczmarz algorithm

We study how the learning rate affects the performance of a relaxed rand...

Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning Model Ensembling

Ensembling deep learning models is a shortcut to promote its implementat...