Grounding Language with Visual Affordances over Unstructured Data

10/04/2022
by   Oier Mees, et al.
0

Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires large-scale data collection and frequent human intervention to reset the environment or help correcting the current policies. In this work, we propose a novel approach to efficiently learn general-purpose language-conditioned robot skills from unstructured, offline and reset-free data in the real world by exploiting a self-supervised visuo-lingual affordance model, which requires annotating as little as 1 language. We evaluate our method in extensive experiments both in simulated and real-world robotic tasks, achieving state-of-the-art performance on the challenging CALVIN benchmark and learning over 25 distinct visuomotor manipulation tasks with a single policy in the real world. We find that when paired with LLMs to break down abstract natural language instructions into subgoals via few-shot prompting, our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches. Code and videos are available at http://hulc2.cs.uni-freiburg.de

READ FULL TEXT

page 1

page 2

page 3

page 9

research
04/04/2022

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Large language models can encode a wealth of semantic knowledge about th...
research
04/13/2022

What Matters in Language Conditioned Robotic Imitation Learning

A long-standing goal in robotics is to build robots that can perform a w...
research
10/12/2022

Interactive Language: Talking to Robots in Real Time

We present a framework for building interactive, real-time, natural lang...
research
05/15/2020

Grounding Language in Play

Natural language is perhaps the most versatile and intuitive way for hum...
research
07/12/2022

Inner Monologue: Embodied Reasoning through Planning with Language Models

Recent works have shown how the reasoning capabilities of Large Language...
research
05/30/2023

Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

The growing interest in language-conditioned robot manipulation aims to ...
research
08/24/2023

BridgeData V2: A Dataset for Robot Learning at Scale

We introduce BridgeData V2, a large and diverse dataset of robotic manip...

Please sign up or login with your details

Forgot password? Click here to reset