RvS: What is Essential for Offline RL via Supervised Learning?

12/20/2021
by   Scott Emmons, et al.
7

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
03/08/2021

Instabilities of Offline RL with Pre-Trained Neural Representation

In offline reinforcement learning (RL), we seek to utilize offline data ...
research
09/22/2021

A Workflow for Offline Model-Free Robotic Reinforcement Learning

Offline reinforcement learning (RL) enables learning control policies by...
research
04/13/2021

Online and Offline Reinforcement Learning by Planning with a Learned Model

Learning efficiently from small amounts of data has long been the focus ...
research
02/11/2022

Online Decision Transformer

Recent work has shown that offline reinforcement learning (RL) can be fo...
research
06/08/2023

Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

Recent works have shown the potential of diffusion models in computer vi...
research
05/04/2023

Masked Trajectory Models for Prediction, Representation, and Control

We introduce Masked Trajectory Models (MTM) as a generic abstraction for...
research
08/14/2020

Interactive Visualization for Debugging RL

Visualization tools for supervised learning allow users to interpret, in...

Please sign up or login with your details

Forgot password? Click here to reset