On the Reuse Bias in Off-Policy Reinforcement Learning

09/15/2022
by   Chengyang Ying, et al.
0

Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to address this issue mainly focus on analyzing the variance of IS. In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS – the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization. We theoretically show that the off-policy evaluation and optimization of the current policy with the data from the replay buffer result in an overestimation of the objective, which may cause an erroneous gradient update and degenerate the performance. We further provide a high-probability upper bound of the Reuse Bias, and show that controlling one term of the upper bound can control the Reuse Bias by introducing the concept of stability for off-policy algorithms. Based on these analyses, we finally present a novel Bias-Regularized Importance Sampling (BIRIS) framework along with practical algorithms, which can alleviate the negative impact of the Reuse Bias. Experimental results show that our BIRIS-based methods can significantly improve the sample efficiency on a series of continuous control tasks in MuJoCo.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2019

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

In importance sampling (IS)-based reinforcement learning algorithms such...
research
05/04/2023

Rethinking Population-assisted Off-policy Reinforcement Learning

While off-policy reinforcement learning (RL) algorithms are sample effic...
research
06/11/2019

Importance Resampling for Off-policy Prediction

Importance sampling (IS) is a common reweighting strategy for off-policy...
research
09/17/2018

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to s...
research
05/31/2018

Sample Reuse via Importance Sampling in Information Geometric Optimization

In this paper we propose a technique to reduce the number of function ev...
research
06/26/2020

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement...
research
08/25/2022

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many fact...

Please sign up or login with your details

Forgot password? Click here to reset