VeLO: Training Versatile Learned Optimizers by Scaling Up

11/17/2022
by   Luke Metz, et al.
0

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

READ FULL TEXT
research
03/22/2022

Practical tradeoffs between memory, compute, and performance in learned optimizers

Optimization plays a costly and crucial role in developing machine learn...
research
03/06/2023

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Solving a problem with a deep learning model requires researchers to opt...
research
01/14/2021

Training Learned Optimizers with Randomly Initialized Learned Optimizers

Learned optimizers are increasingly effective, with performance exceedin...
research
02/02/2023

Mnemosyne: Learning to Train Transformers with Transformers

Training complex machine learning (ML) architectures requires a compute ...
research
03/13/2019

DeepOBS: A Deep Learning Optimizer Benchmark Suite

Because the choice and tuning of the optimizer affects the speed, and ul...
research
06/08/2019

Using learned optimizers to make models robust to input noise

State-of-the art vision models can achieve superhuman performance on ima...
research
07/24/2023

An Isometric Stochastic Optimizer

The Adam optimizer is the standard choice in deep learning applications....

Please sign up or login with your details

Forgot password? Click here to reset