Behavior Transformers: Cloning k modes with one stone

While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Behavioral Cloning to learn from large, pre-collected datasets. In this work, we present Behavior Transformer (BeT), a new technique to model unlabeled demonstration data with multiple modes. BeT retrofits standard transformer architectures with action discretization coupled with a multi-task action correction inspired by offset prediction in object detection. This allows us to leverage the multi-modal modeling ability of modern transformers to predict multi-modal continuous actions. We experimentally evaluate BeT on a variety of robotic manipulation and self-driving behavior datasets. We show that BeT significantly improves over prior state-of-the-art work on solving demonstrated tasks while capturing the major modes present in the pre-collected datasets. Finally, through an extensive ablation study, we analyze the importance of every crucial component in BeT. Videos of behavior generated by BeT are available at https://notmahi.github.io/bet

READ FULL TEXT

page 2

page 4

research
10/18/2022

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

While large-scale sequence modeling from offline data has led to impress...
research
09/12/2022

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Transformers have revolutionized vision and natural language processing ...
research
09/18/2023

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for tr...
research
01/04/2021

Transformers in Vision: A Survey

Astounding results from transformer models on natural language tasks hav...
research
04/20/2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

The widespread dissemination of forged images generated by Deepfake tech...
research
03/23/2022

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Finding relevant moments and highlights in videos according to natural l...
research
09/21/2022

Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos

During recent years transformers architectures have been growing in popu...

Please sign up or login with your details

Forgot password? Click here to reset