Learning to Learn without Gradient Descent by Gradient Descent

by   Yutian Chen, et al.

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.


page 1

page 2

page 3

page 4


Meta-Learning for Black-box Optimization

Recently, neural networks trained as optimizers under the "learning to l...

Learning to learn by gradient descent by gradient descent

The move from hand-designed features to learned features in machine lear...

Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

With the increase of machine learning usage by industries and scientific...

Hyper-parameter Tuning under a Budget Constraint

We study a budgeted hyper-parameter tuning problem, where we optimize th...

Vocal Tract Area Estimation by Gradient Descent

Articulatory features can provide interpretable and flexible controls fo...

Surfing: Iterative optimization over incrementally trained deep networks

We investigate a sequential optimization procedure to minimize the empir...

Please sign up or login with your details

Forgot password? Click here to reset