Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons

07/15/2020
by   Paul Micaelli, et al.
18

Gradient-based hyperparameter optimization is an attractive way to perform meta-learning across a distribution of tasks, or improve the performance of an optimizer on a single task. However, this approach has been unpopular for tasks requiring long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online or split the horizon into smaller chunks. However, this introduces greediness which comes with a large performance drop, since the best local hyperparameters can make for poor global solutions. In this work, we enable non-greediness over long horizons with a two-fold solution. First, we share hyperparameters that are contiguous in time, and show that this drastically mitigates gradient degradation issues. Then, we derive a forward-mode differentiation algorithm for the popular momentum-based SGD optimizer, which allows for a memory cost that is constant with horizon size. When put together, these solutions allow us to learn hyperparameters without any prior knowledge. Compared to the baseline of hand-tuned off-the-shelf hyperparameters, our method compares favorably on simple datasets like SVHN. On CIFAR-10 we match the baseline performance, and demonstrate for the first time that learning rate, momentum and weight decay schedules can be learned with gradients on a dataset of this size. Code is available at https://github.com/polo5/NonGreedyGradientHPO

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2019

Gradient Descent: The Ultimate Optimizer

Working with any gradient-based machine learning algorithm involves the ...
research
10/06/2021

Online Hyperparameter Meta-Learning with Hypergradient Distillation

Many gradient-based meta-learning methods assume a set of parameters tha...
research
02/11/2015

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients ...
research
03/06/2018

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

Careful tuning of the learning rate, or even schedules thereof, can be c...
research
01/19/2023

A Nonstochastic Control Approach to Optimization

Tuning optimizer hyperparameters, notably the learning rate to a particu...
research
02/16/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Changes in neural architectures have fostered significant breakthroughs ...
research
08/05/2020

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

Distribution-based search algorithms are an effective approach for evolu...

Please sign up or login with your details

Forgot password? Click here to reset