Practical tradeoffs between memory, compute, and performance in learned optimizers

03/22/2022
by   Luke Metz, et al.
10

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and more memory efficient than previous work.

READ FULL TEXT

page 19

page 20

research
11/17/2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across m...
research
06/05/2019

Risks from Learned Optimization in Advanced Machine Learning Systems

We analyze the type of learned optimization that occurs when a learned m...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and validation loss

Deep learning has shown that learned functions can dramatically outperfo...
research
10/24/2018

Learned optimizers that outperform SGD on wall-clock and test loss

Deep learning has shown that learned functions can dramatically outperfo...
research
01/14/2021

Training Learned Optimizers with Randomly Initialized Learned Optimizers

Learned optimizers are increasingly effective, with performance exceedin...
research
09/23/2020

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Much as replacing hand-designed features with learned functions has revo...
research
02/12/2019

Extreme Tensoring for Low-Memory Preconditioning

State-of-the-art models are now trained with billions of parameters, rea...

Please sign up or login with your details

Forgot password? Click here to reset