DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

07/01/2023
by   Yanjiang Guo, et al.
0

Large language models encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities. Previous research has explored how to ground language models in robotic tasks to ensure that the sequences generated by the language model are both logically correct and practically executable. However, low-level execution may deviate from the high-level plan due to environmental perturbations or imperfect controller design. In this paper, we propose DoReMi, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, during low-level skill execution, we use a vision question answering (VQA) model to regularly detect plan-execution misalignments. If certain misalignment occurs, our method will call the language model to re-plan in order to recover from misalignments. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at https://sites.google.com/view/doremi-paper.

READ FULL TEXT

page 6

page 8

page 9

page 16

page 17

page 18

page 19

page 21

research
04/04/2022

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Large language models can encode a wealth of semantic knowledge about th...
research
03/06/2023

PaLM-E: An Embodied Multimodal Language Model

Large language models excel at a wide range of complex tasks. However, e...
research
05/10/2023

Multimodal Contextualized Plan Prediction for Embodied Task Completion

Task planning is an important component of traditional robotics systems ...
research
06/16/2023

Structured Thoughts Automaton: First Formalized Execution Model for Auto-Regressive Language Models

In recent months, Language Models (LMs) have become a part of daily disc...
research
05/24/2023

Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Large language models (LLMs) have demonstrated remarkable language profi...
research
08/22/2023

ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

In this paper, we argue that the next generation of robots can be comman...
research
04/17/2023

Grounding Classical Task Planners via Vision-Language Models

Classical planning systems have shown great advances in utilizing rule-b...

Please sign up or login with your details

Forgot password? Click here to reset