Online Decision Transformer

02/11/2022
by   Qinqing Zheng, et al.
0

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

READ FULL TEXT

page 15

page 17

page 21

research
01/31/2023

Skill Decision Transformer

Recent work has shown that Large Language Models (LLMs) can be incredibl...
research
01/28/2022

Can Wikipedia Help Offline Reinforcement Learning?

Fine-tuning reinforcement learning (RL) models has been challenging beca...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
03/07/2023

Graph Decision Transformer

Offline reinforcement learning (RL) is a challenging task, whose objecti...
research
06/26/2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Large transformer models trained on diverse datasets have shown a remark...
research
12/20/2021

RvS: What is Essential for Offline RL via Supervised Learning?

Recent work has shown that supervised learning alone, without temporal d...
research
04/18/2023

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

The success of transformer models trained with a language modeling objec...

Please sign up or login with your details

Forgot password? Click here to reset