Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

10/20/2021
by   Ross M. Clarke, et al.
0

Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.

READ FULL TEXT

page 5

page 6

page 16

page 17

page 19

page 27

research
08/25/2022

A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method

Hyperparameter optimization in machine learning is often achieved using ...
research
04/21/2021

Automatic model training under restrictive time constraints

We develop a hyperparameter optimisation algorithm, Automated Budget Con...
research
12/11/2022

CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization

The hyperparameter optimization of neural network can be expressed as a ...
research
10/15/2018

Hyperparameter Learning via Distributional Transfer

Bayesian optimisation is a popular technique for hyperparameter learning...
research
06/01/2021

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

In recent years, implicit deep learning has emerged as a method to incre...
research
12/06/2022

DiffTune^+: Hyperparameter-Free Auto-Tuning using Auto-Differentiation

Controller tuning is a vital step to ensure the controller delivers its ...
research
06/29/2023

Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation

We introduce ordered transfer hyperparameter optimisation (OTHPO), a ver...

Please sign up or login with your details

Forgot password? Click here to reset