One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

10/30/2021
by   Clément Bonnet, et al.
0

Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning. Among all the methods available, meta-gradients have emerged as a promising approach. They leverage the differentiability of the learning rule with respect to some hyper-parameters to adapt them in an online fashion. Although meta-gradients can be accumulated over multiple learning steps to avoid myopic updates, this is rarely used in practice. In this work, we demonstrate that whilst multi-step meta-gradients do provide a better learning signal in expectation, this comes at the cost of a significant increase in variance, hindering performance. In the light of this analysis, we introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal, essentially trading off bias and variance in meta-gradient estimation. When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

A History of Meta-gradient: Gradient Methods for Meta-learning

The history of meta-learning methods based on gradient descent is review...
research
09/22/2022

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

Meta-gradients provide a general approach for optimizing the meta-parame...
research
12/14/2021

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Despite the empirical success of meta reinforcement learning (meta-RL), ...
research
11/19/2022

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune the...
research
02/18/2020

Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms

As a popular meta-learning approach, the model-agnostic meta-learning (M...
research
07/11/2020

Online Parameter-Free Learning of Multiple Low Variance Tasks

We propose a method to learn a common bias vector for a growing sequence...
research
05/03/2022

Model-Free Opponent Shaping

In general-sum games, the interaction of self-interested learning agents...

Please sign up or login with your details

Forgot password? Click here to reset