WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model

08/30/2023
by   Tianyu Wang, et al.
0

Enabling robots to understand language instructions and react accordingly to visual perception has been a long-standing goal in the robotics research community. Achieving this goal requires cutting-edge advances in natural language processing, computer vision, and robotics engineering. Thus, this paper mainly investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system to enhance the effectiveness of the human-robot interaction. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. The system utilizes the LLM of ChatGPT to summarize the preference object of the users as a target instruction via the multi-round interactive dialogue. The target instruction is then forwarded to a visual grounding system for object pose and size estimation, following which the robot grasps the object accordingly. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task. The further experimental results on various real-world scenarios demonstrated the feasibility and efficacy of our proposed framework. See the project website at: https://star-uu-wang.github.io/WALL-E/

READ FULL TEXT

page 2

page 4

page 7

page 8

page 9

page 10

research
08/24/2023

HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Human robot interaction is an exciting task, which aimed to guide robots...
research
09/14/2023

PROGrasp: Pragmatic Human-Robot Communication for Object Grasping

Interactive Object Grasping (IOG) is the task of identifying and graspin...
research
08/22/2023

ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

In this paper, we argue that the next generation of robots can be comman...
research
03/15/2022

Interactive Robotic Grasping with Attribute-Guided Disambiguation

Interactive robotic grasping using natural language is one of the most f...
research
02/24/2023

A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

We focus on the task of language-conditioned grasping in clutter, in whi...
research
05/09/2022

Learning 6-DoF Object Poses to Grasp Category-level Objects by Language Instructions

This paper studies the task of any objects grasping from the known categ...
research
07/27/2021

End-To-End Real-Time Visual Perception Framework for Construction Automation

In this work, we present a robotic solution to automate the task of wall...

Please sign up or login with your details

Forgot password? Click here to reset