Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

08/07/2023
by   Nirbhay Modhe, et al.
0

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation – penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen states far away from the available offline data due to two factors – (a) very short rollout horizons in models due to cascading model errors, and (b) model rollouts originating solely from states observed in offline data. We relax the second assumption and present a novel unseen state augmentation strategy to allow exploitation of unseen states where the learned model and value estimates generalize. Our strategy finds unseen states by value-informed perturbations of seen states followed by filtering out states with epistemic uncertainty estimates too high (high error) or too low (too similar to seen data). We observe improved performance in several offline RL tasks and find that our augmentation strategy consistently leads to overall lower average dataset Q-value estimates i.e. more conservative Q-value estimates than a baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2023

DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Model-based reinforcement learning (RL), which learns environment model ...
research
10/01/2021

Offline Reinforcement Learning with Reverse Model-based Imagination

In offline reinforcement learning (offline RL), one of the main challeng...
research
06/03/2022

Offline Reinforcement Learning with Causal Structured World Models

Model-based methods have recently shown promising for offline reinforcem...
research
11/21/2022

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

In many real-world applications, collecting large and high-quality datas...
research
10/20/2022

MoCoDA: Model-based Counterfactual Data Augmentation

The number of states in a dynamic process is exponential in the number o...
research
05/31/2023

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

A promising technique for exploration is to maximize the entropy of visi...
research
08/26/2020

Identifying Critical States by the Action-Based Variance of Expected Return

The balance of exploration and exploitation plays a crucial role in acce...

Please sign up or login with your details

Forgot password? Click here to reset