Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

08/28/2023
by   Hanhan Zhou, et al.
0

Offline reinforcement learning aims to utilize datasets of previously gathered environment-action interaction records to learn a policy without access to the real environment. Recent work has shown that offline reinforcement learning can be formulated as a sequence modeling problem and solved via supervised learning with approaches such as decision transformer. While these sequence-based methods achieve competitive results over return-to-go methods, especially on tasks that require longer episodes or with scarce rewards, importance sampling is not considered to correct the policy bias when dealing with off-policy data, mainly due to the absence of behavior policy and the use of deterministic evaluation policies. To this end, we propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation (DPE) in a unified framework with statistically proven properties on variance reduction. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks. Our method brings a performance improvements on selected methods which outperforms SOTA baselines in several tasks, demonstrating the advantages of enabling double policy estimation for sequence-modeled reinforcement learning.

READ FULL TEXT
research
06/04/2022

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

Value function estimation is an indispensable subroutine in reinforcemen...
research
07/07/2023

Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning

Recent work has demonstrated the effectiveness of formulating decision m...
research
07/21/2022

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Impressive results in natural language processing (NLP) based on the Tra...
research
02/11/2022

Online Decision Transformer

Recent work has shown that offline reinforcement learning (RL) can be fo...
research
07/15/2021

A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis

We convert the DeepMind Mathematics Dataset into a reinforcement learnin...
research
01/30/2023

Designing an offline reinforcement learning objective from scratch

Offline reinforcement learning has developed rapidly over the recent yea...
research
05/22/2022

Offline Policy Comparison with Confidence: Benchmarks and Baselines

Decision makers often wish to use offline historical data to compare seq...

Please sign up or login with your details

Forgot password? Click here to reset