Robot Learning with Sensorimotor Pre-training

06/16/2023
by   Ilija Radosavovic, et al.
8

We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. Given a sequence of camera images, proprioceptive robot states, and past actions, we encode the interleaved sequence into tokens, mask out a random subset, and train a model to predict the masked-out content. We hypothesize that if the robot can predict the missing content it has acquired a good model of the physical world that can enable it to act. RPT is designed to operate on latent visual representations which makes prediction tractable, enables scaling to 10x larger models, and 10 Hz inference on a real robot. To evaluate our approach, we collect a dataset of 20,000 real-world trajectories over 9 months using a combination of motion planning and model-based grasping algorithms. We find that pre-training on this data consistently outperforms training from scratch, leads to 2x improvements in the block stacking task, and has favorable scaling properties.

READ FULL TEXT

page 5

page 7

research
10/06/2022

Real-World Robot Learning with Masked Visual Pre-training

In this work, we explore self-supervised visual pre-training on images f...
research
10/30/2022

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

Self-supervised pre-training has been successful in both text and speech...
research
09/22/2022

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

Robotics has long been a field riddled with complex systems architecture...
research
01/13/2020

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

In this paper, we present a new sequence-to-sequence pre-training model ...
research
06/21/2021

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Video understanding relies on perceiving the global content and modeling...
research
10/12/2020

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

Masked Language Model (MLM) framework has been widely adopted for self-s...
research
10/13/2022

Exploring Long-Sequence Masked Autoencoders

Masked Autoencoding (MAE) has emerged as an effective approach for pre-t...

Please sign up or login with your details

Forgot password? Click here to reset