Meta Mirror Descent: Optimiser Learning for Fast Convergence

03/05/2022
by   Boyan Gao, et al.
0

Optimisers are an essential component for training machine learning models, and their design influences learning speed and generalisation. Several studies have attempted to learn more effective gradient-descent optimisers via solving a bi-level optimisation problem where generalisation error is minimised with respect to optimiser parameters. However, most existing optimiser learning methods are intuitively motivated, without clear theoretical support. We take a different perspective starting from mirror descent rather than gradient descent, and meta-learning the corresponding Bregman divergence. Within this paradigm, we formalise a novel meta-learning objective of minimising the regret bound of learning. The resulting framework, termed Meta Mirror Descent (MetaMD), learns to accelerate optimisation speed. Unlike many meta-learned optimisers, it also supports convergence and generalisation guarantees and uniquely does so without requiring validation data. We evaluate our framework on a variety of tasks and architectures in terms of convergence rate and generalisation error and demonstrate strong performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

A History of Meta-gradient: Gradient Methods for Meta-learning

The history of meta-learning methods based on gradient descent is review...
research
10/31/2017

Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

Learning to learn is a powerful paradigm for enabling models to learn fr...
research
01/17/2018

Meta-Learning with Adaptive Layerwise Metric and Subspace

Recent advances in meta-learning demonstrate that deep representations c...
research
06/19/2020

Meta Learning in the Continuous Time Limit

In this paper, we establish the ordinary differential equation (ODE) tha...
research
06/29/2021

Meta-learning for Matrix Factorization without Shared Rows or Columns

We propose a method that meta-learns a knowledge on matrix factorization...
research
08/17/2020

Adaptive Multi-level Hyper-gradient Descent

Adaptive learning rates can lead to faster convergence and better final ...
research
06/25/2020

Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets

Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has bec...

Please sign up or login with your details

Forgot password? Click here to reset