Greedy Multi-step Off-Policy Reinforcement Learning

02/23/2021
by   Yuhui Wang, et al.
0

Multi-step off-policy reinforcement learning has achieved great success. However, existing multi-step methods usually impose a fixed prior on the bootstrap steps, while the off-policy methods often require additional correction, suffering from certain undesired effects. In this paper, we propose a novel bootstrapping method, which greedily takes the maximum value among the bootstrapping values with varying steps. The new method has two desired properties:1) it can flexibly adjust the bootstrap step based on the quality of the data and the learned value function; 2) it can safely and robustly utilize data from arbitrary behavior policy without additional correction, whatever its quality or "off-policyness". We analyze the theoretical properties of the related operator, showing that it is able to converge to the global optimal value function, with a ratio faster than the traditional Bellman Optimality Operator. Furthermore, based on this new operator, we derive new model-free RL algorithms named Greedy Multi-Step Q Learning (and Greedy Multi-step DQN). Experiments reveal that the proposed methods are reliable, easy to implement, and achieve state-of-the-art performance on a series of standard benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Multi-step greedy policies have been extensively used in model-based Rei...
research
06/23/2020

The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning

Multi-step (also called n-step) methods in reinforcement learning (RL) h...
research
07/30/2021

Maximum Entropy Dueling Network Architecture

In recent years, there have been many deep structures for Reinforcement ...
research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...
research
01/01/2022

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Reinforcement learning (RL) has drawn increasing interests in recent yea...
research
05/21/2018

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

Multiple-step lookahead policies have demonstrated high empirical compet...
research
10/01/2019

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

In this paper, we aim to develop a simple and scalable reinforcement lea...

Please sign up or login with your details

Forgot password? Click here to reset