Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

11/19/2022
by   Clément Bonnet, et al.
0

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune their hyper-parameters in an online fashion during training. In this paper, we identify a bias in the meta-gradient of current meta-gradient RL approaches. This bias comes from using the critic that is trained using the meta-learned discount factor for the advantage estimation in the outer objective which requires a different discount factor. Because the meta-learned discount factor is typically lower than the one used in the outer objective, the resulting bias can cause the meta-gradient to favor myopic policies. We propose a simple solution to this issue: we eliminate this bias by using an alternative, outer value function in the estimation of the outer loss. To obtain this outer value function we add a second head to the critic network and train it alongside the classic critic, using the outer loss discount factor. On an illustrative toy problem, we show that the bias can cause catastrophic failure of current meta-gradient RL approaches, and show that our proposed solution fixes it. We then apply our method to a more complex environment and demonstrate that fixing the meta-gradient bias can significantly improve performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Deep reinforcement learning includes a broad family of algorithms that p...
research
10/22/2019

State2vec: Off-Policy Successor Features Approximators

A major challenge in reinforcement learning (RL) is the design of agents...
research
06/06/2021

Efficient Continuous Control with Double Actors and Regularized Critics

How to obtain good value estimation is one of the key problems in Reinfo...
research
06/18/2020

Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient

The overestimation phenomenon caused by function approximation is a well...
research
10/30/2021

One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

Self-tuning algorithms that adapt the learning process online encourage ...
research
10/18/2021

Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

In this paper, an algorithm for approximate evaluation of back-propagati...
research
12/14/2021

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Despite the empirical success of meta reinforcement learning (meta-RL), ...

Please sign up or login with your details

Forgot password? Click here to reset