GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

07/12/2023
by   Junghyun Kim, et al.
0

Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world data. A straightforward solution is to collect additional training data, but the cost of human-annotation is extortionate. In this paper, we propose Grounding Vision to Ceaselessly Created Instructions (GVCCI), a lifelong learning framework for LGRM, which continuously learns VG without human supervision. GVCCI iteratively generates synthetic instruction via object detection and trains the VG model with the generated data. We validate our framework in offline and online settings across diverse environments on different VG models. Experimental results show that accumulating synthetic data from GVCCI leads to a steady improvement in VG by up to 56.7 qualitative analysis shows that the unadapted VG model often fails to find correct objects due to a strong bias learned from the pre-training data. Finally, we introduce a novel VG dataset for LGRM, consisting of nearly 252k triplets of image-object-instruction from diverse manipulation environments.

READ FULL TEXT

page 1

page 2

page 3

page 6

research
06/20/2023

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Pre-training robot policies with a rich set of skills can substantially ...
research
01/20/2022

LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training

Language-based environment manipulation requires agents to manipulate th...
research
09/22/2021

Audio-Visual Grounding Referring Expression for Robotic Manipulation

Referring expressions are commonly used when referring to a specific tar...
research
12/16/2020

Visually Grounding Instruction for History-Dependent Manipulation

This paper emphasizes the importance of robot's ability to refer its tas...
research
09/14/2023

GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

Recent works have shown that Large Language Models (LLMs) can promote gr...
research
10/22/2019

Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments

Recent advances in data-driven models for grounded language understandin...
research
08/02/2023

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

While language-guided image manipulation has made remarkable progress, t...

Please sign up or login with your details

Forgot password? Click here to reset