Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

11/09/2021
by   Jinning Li, et al.
0

Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.

READ FULL TEXT

page 7

page 18

research
06/15/2023

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

Offline reinforcement learning (RL) that learns policies from offline da...
research
07/18/2022

Back to the Manifold: Recovering from Out-of-Distribution States

Learning from previously collected datasets of expert data offers the pr...
research
10/24/2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn the optimal policy fro...
research
04/02/2020

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Interactive adaptive systems powered by Reinforcement Learning (RL) have...
research
11/21/2022

Data-Driven Offline Decision-Making via Invariant Representation Learning

The goal in offline data-driven decision-making is synthesize decisions ...
research
02/02/2023

Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition

Many environments contain numerous available niches of variable value, e...
research
05/24/2023

Collaborative World Models: An Online-Offline Transfer RL Approach

Training visual reinforcement learning (RL) models in offline datasets i...

Please sign up or login with your details

Forgot password? Click here to reset