GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

09/14/2023
by   Zhe Ni, et al.
0

Recent works have shown that Large Language Models (LLMs) can promote grounding instructions to robotic task planning. Despite the progress, most existing works focused on utilizing raw images to help LLMs understand environmental information, which not only limits the observation scope but also typically requires massive multimodal data collection and large-scale models. In this paper, we propose a novel approach called Graph-based Robotic Instruction Decomposer (GRID), leverages scene graph instead of image to perceive global scene information and continuously plans subtask in each stage for a given instruction. Our method encodes object attributes and relationships in graphs through an LLM and Graph Attention Networks, integrating instruction features to predict subtasks consisting of pre-defined robot actions and target objects in the scene graph. This strategy enables robots to acquire semantic knowledge widely observed in the environment from the scene graph. To train and evaluate GRID, we build a dataset construction pipeline to generate synthetic datasets in graph-based robotic task planning. Experiments have shown that our method outperforms GPT-4 by over 25.4 accuracy. Experiments conducted on datasets of unseen scenes and scenes with different numbers of objects showed that the task accuracy of GRID declined by at most 3.8 We validate our method in both physical simulation and the real world.

READ FULL TEXT

page 1

page 6

research
07/12/2023

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

Large language models (LLMs) have demonstrated impressive results in dev...
research
11/21/2022

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

In recent years, much progress has been made in learning robotic manipul...
research
07/12/2023

GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language-Guided Robotic Manipulation (LGRM) is a challenging task as it ...
research
07/07/2023

Open-Vocabulary Object Detection via Scene Graph Discovery

In recent years, open-vocabulary (OV) object detection has attracted inc...
research
09/05/2023

Structural Concept Learning via Graph Attention for Multi-Level Rearrangement Planning

Robotic manipulation tasks, such as object rearrangement, play a crucial...
research
01/30/2020

Graph Cepstrum: Spatial Feature Extracted from Partially Connected Microphones

In this paper, we propose an effective and robust method of spatial feat...
research
11/15/2022

Simulated Mental Imagery for Robotic Task Planning

Traditional AI-planning methods for task planning in robotics require sy...

Please sign up or login with your details

Forgot password? Click here to reset