Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

09/23/2020
by   Luke Metz, et al.
12

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

READ FULL TEXT
research
02/03/2023

Learning to Optimize for Reinforcement Learning

In recent years, by leveraging more data, computation, and diverse tasks...
research
06/02/2021

A Generalizable Approach to Learning Optimizers

A core issue with learning to optimize neural networks has been the lack...
research
03/22/2022

Practical tradeoffs between memory, compute, and performance in learned optimizers

Optimization plays a costly and crucial role in developing machine learn...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and validation loss

Deep learning has shown that learned functions can dramatically outperfo...
research
06/30/2020

Maximum Entropy Models for Fast Adaptation

Deep Neural Networks have shown great promise on a variety of downstream...
research
12/21/2021

Provable Hierarchical Lifelong Learning with a Sketch-based Modular Architecture

We propose a modular architecture for the lifelong learning of hierarchi...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and test loss

Deep learning has shown that learned functions can dramatically outperfo...

Please sign up or login with your details

Forgot password? Click here to reset