JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

08/28/2022
by   Kaizhi Zheng, et al.
50

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc. Traditional symbolic methods have scaling and generalization issues, while end-to-end deep learning models suffer from data scarcity and high task complexity, and are often hard to explain. To benefit from both worlds, we propose JARVIS, a neuro-symbolic commonsense reasoning framework for modular, generalizable, and interpretable conversational embodied agents. First, it acquires symbolic representations by prompting large language models (LLMs) for language understanding and sub-goal planning, and by constructing semantic maps from visual observations. Then the symbolic module reasons for sub-goal planning and action generation based on task- and action-level common sense. Extensive experiments on the TEACh dataset validate the efficacy and efficiency of our JARVIS framework, which achieves state-of-the-art (SOTA) results on all three dialog-based embodied tasks, including Execution from Dialog History (EDH), Trajectory from Dialog (TfD), and Two-Agent Task Completion (TATC) (e.g., our method boosts the unseen Success Rate on EDH from 6.1% to 15.8%). Moreover, we systematically analyze the essential factors that affect the task performance and also demonstrate the superiority of our method in few-shot settings. Our JARVIS model ranks first in the Alexa Prize SimBot Public Benchmark Challenge.

READ FULL TEXT

page 2

page 4

page 17

page 18

page 19

page 20

research
09/17/2021

Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules

One of the challenges faced by conversational agents is their inability ...
research
06/17/2020

Conversational Neuro-Symbolic Commonsense Reasoning

One aspect of human commonsense reasoning is the ability to make presump...
research
09/16/2017

Augmenting End-to-End Dialog Systems with Commonsense Knowledge

Building dialog agents that can converse naturally with humans is a chal...
research
04/10/2022

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

Visual Dialog requires an agent to engage in a conversation with humans ...
research
08/18/2020

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

With the arising concerns for the AI systems provided with direct access...
research
08/11/2023

Dynamic Planning with a LLM

While Large Language Models (LLMs) can solve many NLP tasks in zero-shot...
research
07/21/2023

How to Tidy Up a Table: Fusing Visual and Semantic Commonsense Reasoning for Robotic Tasks with Vague Objectives

Vague objectives in many real-life scenarios pose long-standing challeng...

Please sign up or login with your details

Forgot password? Click here to reset