RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

06/20/2023
by   Konstantinos Bousmalis, et al.
0

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100–1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

READ FULL TEXT

page 5

page 6

page 7

page 26

page 31

page 35

page 36

page 37

research
10/15/2022

PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale

The predictive information, the mutual information between the past and ...
research
10/21/2019

Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation

Collecting and automatically obtaining reward signals from real robotic ...
research
07/28/2020

KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation

We present KOVIS, a novel learning-based, calibration-free visual servoi...
research
07/18/2023

Towards A Unified Agent with Foundation Models

Language Models and Vision Language Models have recently demonstrated un...
research
07/01/2021

Learning to See before Learning to Act: Visual Pre-training for Manipulation

Does having visual priors (e.g. the ability to detect objects) facilitat...
research
05/15/2020

Grounding Language in Play

Natural language is perhaps the most versatile and intuitive way for hum...
research
09/12/2022

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Transformers have revolutionized vision and natural language processing ...

Please sign up or login with your details

Forgot password? Click here to reset