When does return-conditioned supervised learning work for offline reinforcement learning?

06/02/2022
by   David Brandfonbrener, et al.
0

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL, something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms. We provide specific examples of MDPs and datasets that illustrate the necessity of these assumptions and the limits of RCSL. Finally, we present empirical evidence that these limitations will also cause issues in practice by providing illustrative experiments in simple point-mass environments and on datasets from the D4RL benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...
research
02/24/2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

Upside down reinforcement learning (UDRL) flips the conventional use of ...
research
05/29/2019

On the Generalization Gap in Reparameterizable Reinforcement Learning

Understanding generalization in reinforcement learning (RL) is a signifi...
research
02/23/2022

Learning Relative Return Policies With Upside-Down Reinforcement Learning

Lately, there has been a resurgence of interest in using supervised lear...
research
06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...
research
09/08/2022

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Recent works have shown that tackling offline reinforcement learning (RL...
research
03/16/2023

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Offline reinforcement learning (RL) aims to infer sequential decision po...

Please sign up or login with your details

Forgot password? Click here to reset