Selective Uncertainty Propagation in Offline RL

We study the finite-horizon offline reinforcement learning (RL) problem. Since actions at any state can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically complex than offline policy learning for a finite sequence of stochastic contextual bandit environments. We formalize this insight by showing that the statistical hardness of offline RL instances can be measured by estimating the size of actions' impact on next-state distributions. Furthermore, this estimated impact allows us to propagate just enough value function uncertainty from future steps to avoid model exploitation, enabling us to develop algorithms that improve upon traditional pessimistic approaches for offline RL on statistically simple instances. Our approach is supported by theory and simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previous...
research
06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
06/13/2023

A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning

Offline reinforcement learning (RL) provides a promising solution to lea...
research
05/24/2023

Collaborative World Models: An Online-Offline Transfer RL Approach

Training visual reinforcement learning (RL) models in offline datasets i...
research
06/23/2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Offline Reinforcement Learning (RL) is a promising approach for learning...
research
06/03/2021

Reinforcement Learning as One Big Sequence Modeling Problem

Reinforcement learning (RL) is typically concerned with estimating singl...
research
03/28/2023

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Most offline reinforcement learning (RL) methods suffer from the trade-o...

Please sign up or login with your details

Forgot password? Click here to reset