Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

10/06/2021
by   Ting-Han Fan, et al.
0

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning al...
research
10/02/2020

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

We investigate the discounting mismatch in actor-critic algorithm implem...
research
02/18/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

In this paper, we provide finite-sample convergence guarantees for an of...
research
05/25/2021

Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning

In partially observable reinforcement learning, offline training gives a...
research
10/21/2021

Actor-critic is implicitly biased towards high entropy optimal policies

We show that the simplest actor-critic method – a linear softmax policy ...
research
05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...
research
06/14/2022

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal...

Please sign up or login with your details

Forgot password? Click here to reset