Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

05/23/2023
by   Prajjwal Bhargava, et al.
0

Offline reinforcement learning (RL) allows agents to learn effective, return-maximizing policies from a static dataset. Three major paradigms for offline RL are Q-Learning, Imitation Learning, and Sequence Modeling. A key open question is: which paradigm is preferred under what conditions? We study this question empirically by exploring the performance of representative algorithms – Conservative Q-Learning (CQL), Behavior Cloning (BC), and Decision Transformer (DT) – across the commonly used D4RL and Robomimic benchmarks. We design targeted experiments to understand their behavior concerning data suboptimality and task complexity. Our key findings are: (1) Sequence Modeling requires more data than Q-Learning to learn competitive policies but is more robust; (2) Sequence Modeling is a substantially better choice than both Q-Learning and Imitation Learning in sparse-reward and low-quality data settings; and (3) Sequence Modeling and Imitation Learning are preferable as task horizon increases, or when data is obtained from suboptimal human demonstrators. Based on the overall strength of Sequence Modeling, we also investigate architectural choices and scaling trends for DT on Atari and D4RL and make design recommendations. We find that scaling the amount of data for DT by 5x gives a 2.5x average score improvement on Atari.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

Curriculum Offline Imitation Learning

Offline reinforcement learning (RL) tasks require the agent to learn fro...
research
03/30/2023

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

We study a new paradigm for sequential decision making, called offline P...
research
04/12/2022

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Offline reinforcement learning (RL) algorithms can acquire effective pol...
research
09/18/2023

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for tr...
research
02/06/2023

A Strong Baseline for Batch Imitation Learning

Imitation of expert behaviour is a highly desirable and safe approach to...
research
11/19/2021

Generalized Decision Transformer for Offline Hindsight Information Matching

How to extract as much learning signal from each trajectory data has bee...
research
03/27/2020

Modeling 3D Shapes by Reinforcement Learning

We explore how to enable machines to model 3D shapes like human modelers...

Please sign up or login with your details

Forgot password? Click here to reset