Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

07/21/2022
by   Adam Villaflor, et al.
0

Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2021

Reinforcement Learning as One Big Sequence Modeling Problem

Reinforcement learning (RL) is typically concerned with estimating singl...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
10/13/2021

Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement

Reinforcement learning (RL) is a powerful data-driven control method tha...
research
11/22/2021

UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning

Offline reinforcement learning (RL) provides a framework for learning de...
research
12/29/2022

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Transformer, originally devised for natural language processing, has als...
research
09/29/2022

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

In offline reinforcement learning, weighted regression is a common metho...
research
01/28/2022

Can Wikipedia Help Offline Reinforcement Learning?

Fine-tuning reinforcement learning (RL) models has been challenging beca...

Please sign up or login with your details

Forgot password? Click here to reset