Meta-Learning with Warped Gradient Descent

by   Sebastian Flennerhag, et al.

A versatile and effective approach to meta-learning is to infer a gradient-based up-date rule directly from data that promotes rapid learning of new tasks from the same distribution. Current methods rely on backpropagating through the learning process, limiting their scope to few-shot learning. In this work, we introduce Warped Gradient Descent (WarpGrad), a family of modular optimisers that can scale to arbitrary adaptation processes. WarpGrad methods meta-learn to warp task loss surfaces across the joint task-parameter distribution to facilitate gradient descent, which is achieved by a reparametrisation of neural networks that interleaves warp layers in the architecture. These layers are shared across task learners and fixed during adaptation; they represent a projection of task parameters into a meta-learned space that is conducive to task adaptation and standard backpropagation induces a form of gradient preconditioning. WarpGrad methods are computationally efficient and easy to implement as they rely on parameter sharing and backpropagation. They are readily combined with other meta-learners and can scale both in terms of model size and length of adaptation trajectories as meta-learning warp parameters do not require differentiation through task adaptation processes. We show empirically that WarpGrad optimisers meta-learn a warped space where gradient descent is well behaved, with faster convergence and better performance in a variety of settings, including few-shot, standard supervised, continual, and reinforcement learning.


page 1

page 2

page 3

page 4


Meta-Learning with Adaptive Layerwise Metric and Subspace

Recent advances in meta-learning demonstrate that deep representations c...

Continuous-Time Meta-Learning with Forward Mode Differentiation

Drawing inspiration from gradient-based meta-learning methods with infin...

Meta Learning Backpropagation And Improving It

Many concepts have been proposed for meta learning with neural networks ...

Decoder Choice Network for Meta-Learning

Meta-learning has been widely used for implementing few-shot learning an...

Meta-Learning Bidirectional Update Rules

In this paper, we introduce a new type of generalized neural network whe...

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

The weight matrix (WM) of a neural network (NN) is its program. The prog...

Covariate Distribution Aware Meta-learning

Meta-learning has proven to be successful at few-shot learning across th...