Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

09/12/2022
by   Mohit Shridhar, et al.
2

Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can we still benefit from Transformers with the right problem formulation? We investigate this question with PerAct, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation. PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by "detecting the next best voxel action". Unlike frameworks that operate on 2D images, the voxelized observation and action space provides a strong structural prior for efficiently learning 6-DoF policies. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task. Our results show that PerAct significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.

READ FULL TEXT

page 2

page 4

page 8

page 21

page 22

page 24

page 26

research
06/22/2022

Behavior Transformers: Cloning k modes with one stone

While behavior learning has made impressive progress in recent times, it...
research
03/15/2023

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

A rich representation is key to general robotic manipulation, but existi...
research
04/21/2023

Spatial-Language Attention Policies for Efficient Robot Learning

We investigate how to build and train spatial representations for robot ...
research
06/26/2023

RVT: Robotic View Transformer for 3D Object Manipulation

For 3D object manipulation, methods that build an explicit 3D representa...
research
06/20/2023

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

The ability to leverage heterogeneous robotic experience from different ...
research
06/23/2021

Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

Reflecting on the last few years, the biggest breakthroughs in deep rein...
research
06/08/2023

Efficient Multi-Task Scene Analysis with RGB-D Transformers

Scene analysis is essential for enabling autonomous systems, such as mob...

Please sign up or login with your details

Forgot password? Click here to reset