Learned Optimizers that Scale and Generalize

03/14/2017
by   Olga Wichrowska, et al.
0

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on. We release an open source implementation of the meta-training algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2022

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Learned optimizers – neural networks that are trained to act as optimize...
research
02/03/2023

Learning to Optimize for Reinforcement Learning

In recent years, by leveraging more data, computation, and diverse tasks...
research
06/30/2020

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Learning-to-learn (using optimization algorithms to learn a new optimize...
research
11/29/2022

Learning to Optimize with Dynamic Mode Decomposition

Designing faster optimization algorithms is of ever-growing interest. In...
research
12/09/2016

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Representations are fundamental to artificial intelligence. The performa...
research
02/02/2023

Mnemosyne: Learning to Train Transformers with Transformers

Training complex machine learning (ML) architectures requires a compute ...
research
09/14/2020

Deforming the Loss Surface to Affect the Behaviour of the Optimizer

In deep learning, it is usually assumed that the optimization process is...

Please sign up or login with your details

Forgot password? Click here to reset