Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm

10/18/2019
by   Michele Donini, et al.
0

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between the recently proposed techniques of Franceschi et al. (2017) and Baydin et al. (2017), featuring increased stability and faster convergence. We show empirically that the proposed method compares favourably with baselines and related methods in terms of final test accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/19/2018

Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

First-order optimization methods have been playing a prominent role in d...
06/12/2017

YellowFin and the Art of Momentum Tuning

Hyperparameter tuning is one of the big costs of deep learning. State-of...
05/20/2021

Comment on Stochastic Polyak Step-Size: Performance of ALI-G

This is a short note on the performance of the ALI-G algorithm (Berrada ...
05/07/2021

Network Pruning That Matters: A Case Study on Retraining Variants

Network pruning is an effective method to reduce the computational expen...
11/22/2021

Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies

Natural Evolution Strategies (NES) is a promising framework for black-bo...
02/24/2022

An optimal scheduled learning rate for a randomized Kaczmarz algorithm

We study how the learning rate affects the performance of a relaxed rand...
03/25/2020

Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning Model Ensembling

Ensembling deep learning models is a shortcut to promote its implementat...