Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

09/18/2023
by   Yevgen Chebotar, et al.
0

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io

READ FULL TEXT

page 1

page 6

page 18

research
11/09/2021

AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Robotic skills can be learned via imitation learning (IL) using user-pro...
research
06/17/2022

Bootstrapped Transformer for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning policies from previ...
research
05/23/2023

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Offline reinforcement learning (RL) allows agents to learn effective, re...
research
06/22/2022

Behavior Transformers: Cloning k modes with one stone

While behavior learning has made impressive progress in recent times, it...
research
11/19/2021

Generalized Decision Transformer for Offline Hindsight Information Matching

How to extract as much learning signal from each trajectory data has bee...
research
06/13/2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

This paper presents an overview and comparative analysis of our systems ...
research
06/22/2023

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Learning from visual observation (LfVO), aiming at recovering policies f...

Please sign up or login with your details

Forgot password? Click here to reset