Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

12/14/2021
by   Yunhao Tang, et al.
0

Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance Θ(N) which linearly depends on the sample size N of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias 𝒪(1/√(N)) and variance 𝒪(1/N); (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on N than prior work when N is large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2022

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

Meta-gradients provide a general approach for optimizing the meta-parame...
research
06/24/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...
research
10/30/2021

One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning

Self-tuning algorithms that adapt the learning process online encourage ...
research
06/14/2022

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal...
research
10/29/2019

Irrational Exuberance: Correcting Bias in Probability Estimates

We consider the common setting where one observes probability estimates ...
research
11/19/2022

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune the...
research
02/06/2019

On the Variance of Unbiased Online Recurrent Optimization

The recently proposed Unbiased Online Recurrent Optimization algorithm (...

Please sign up or login with your details

Forgot password? Click here to reset