ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

08/22/2023
by   Bilel Benjdira, et al.
0

In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: https://github.com/bilel-bj/ROSGPT_Vision) to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.

READ FULL TEXT

page 13

page 14

page 15

page 17

page 18

page 19

page 20

page 21

research
05/18/2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Foundation models have made significant strides in various applications,...
research
08/30/2023

WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model

Enabling robots to understand language instructions and react accordingl...
research
08/04/2022

LaTTe: Language Trajectory TransformEr

Natural language is one of the most intuitive ways to express human inte...
research
07/01/2023

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

Large language models encode a vast amount of semantic knowledge and pos...
research
10/10/2022

Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks

Demonstrations and natural language instructions are two common ways to ...
research
06/16/2023

Synchronizing Machine Learning Algorithms, Realtime Robotic Control and Simulated Environment with o80

Robotic applications require the integration of various modalities, enco...
research
06/26/2023

Towards Language-Based Modulation of Assistive Robots through Multimodal Models

In the field of Geriatronics, enabling effective and transparent communi...

Please sign up or login with your details

Forgot password? Click here to reset