Bootstrapped Transformer for Offline Reinforcement Learning

06/17/2022
by   Kerong Wang, et al.
0

Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation models yet has not drawn enough attention in the previous works. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes are available at https://seqml.github.io/bootorl.

READ FULL TEXT

page 15

page 16

research
03/07/2023

Graph Decision Transformer

Offline reinforcement learning (RL) is a challenging task, whose objecti...
research
05/04/2023

Masked Trajectory Models for Prediction, Representation, and Control

We introduce Masked Trajectory Models (MTM) as a generic abstraction for...
research
03/28/2023

Planning with Sequence Models through Iterative Energy Minimization

Recent works have shown that sequence modeling can be effectively used t...
research
09/18/2023

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for tr...
research
09/08/2022

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Recent works have shown that tackling offline reinforcement learning (RL...
research
06/08/2023

Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning

Recent works have shown the potential of diffusion models in computer vi...
research
06/27/2022

Prompting Decision Transformer for Few-Shot Policy Generalization

Humans can leverage prior experience and learn novel tasks from a handfu...

Please sign up or login with your details

Forgot password? Click here to reset