Self-tuning networks:

12/30/2021
by   Jonathan Lorraine, et al.
0

Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and optimizing the hyperparameters using the approximate best-response function. Unlike other gradient-based approaches, we do not require differentiating the training loss with respect to the hyperparameters, allowing us to tune discrete hyperparameters, data augmentation hyperparameters, and dropout probabilities. Because the hyperparameters are adapted online, our approach discovers hyperparameter schedules that can outperform fixed hyperparameter values. Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems. We call our networks, which update their own hyperparameters online during training, Self-Tuning Networks (STNs).

READ FULL TEXT

page 21

page 22

page 23

page 24

research
03/07/2019

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions

Hyperparameter optimization can be formulated as a bilevel optimization ...
research
11/06/2019

Optimizing Millions of Hyperparameters by Implicit Differentiation

We propose an algorithm for inexpensive gradient-based hyperparameter op...
research
02/15/2021

Online hyperparameter optimization by real-time recurrent learning

Conventional hyperparameter optimization methods are computationally int...
research
10/26/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Hyperparameter optimization of neural networks can be elegantly formulat...
research
12/11/2022

CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization

The hyperparameter optimization of neural network can be expressed as a ...
research
09/09/2019

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Hyperparameter optimization is both a practical issue and an interesting...
research
06/13/2022

Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems

Gradient-based optimization methods for hyperparameter tuning guarantee ...

Please sign up or login with your details

Forgot password? Click here to reset