Gradient-based Hyperparameter Optimization through Reversible Learning

02/11/2015
by   Dougal Maclaurin, et al.
0

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.

READ FULL TEXT
research
02/17/2021

Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm

Modern machine learning algorithms usually involve tuning multiple (from...
research
03/06/2017

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing th...
research
06/06/2023

Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Selecting hyperparameters in deep learning greatly impacts its effective...
research
08/19/2023

Dynamic Bilevel Learning with Inexact Line Search

In various domains within imaging and data science, particularly when ad...
research
11/09/2020

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Standard first-order stochastic optimization algorithms base their updat...
research
07/15/2020

Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons

Gradient-based hyperparameter optimization is an attractive way to perfo...
research
11/06/2019

Optimizing Millions of Hyperparameters by Implicit Differentiation

We propose an algorithm for inexpensive gradient-based hyperparameter op...

Please sign up or login with your details

Forgot password? Click here to reset