R2H: Building Multimodal Navigation Helpers that Respond to Help

05/23/2023
by   Yue Fan, et al.
0

The ability to assist humans during a navigation task in a supportive role is crucial for intelligent agents. Such agents, equipped with environment knowledge and conversational abilities, can guide individuals through unfamiliar terrains by generating natural language responses to their inquiries, grounded in the visual information of their surroundings. However, these multimodal conversational navigation helpers are still underdeveloped. This paper proposes a new benchmark, Respond to Help (R2H), to build multimodal navigation helpers that can respond to help, based on existing dialog-based embodied datasets. R2H mainly includes two tasks: (1) Respond to Dialog History (RDH), which assesses the helper agent's ability to generate informative responses based on a given dialog history, and (2) Respond during Interaction (RdI), which evaluates the helper agent's ability to maintain effective and consistent cooperation with a task performer agent during navigation in real-time. Furthermore, we propose a novel task-oriented multimodal response generation model that can see and respond, named SeeRee, as the navigation helper to guide the task performer in embodied tasks. Through both automatic and human evaluations, we show that SeeRee produces more effective and informative responses than baseline methods in assisting the task performer with different navigation tasks. Project website: https://sites.google.com/view/respond2help/home.

READ FULL TEXT

page 1

page 12

page 13

research
07/10/2019

Vision-and-Dialog Navigation

Robots navigating in human environments should use language to ask for a...
research
04/28/2017

Intelligent Personal Assistant with Knowledge Navigation

An Intelligent Personal Agent (IPA) is an agent that has the purpose of ...
research
12/18/2019

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Visual Dialog is a vision-language task that requires an AI agent to eng...
research
12/10/2018

Chat-crowd: A Dialog-based Platform for Visual Layout Composition

In this paper we introduce Chat-crowd, an interactive environment for vi...
research
02/09/2023

Learning by Asking for Embodied Visual Navigation and Task Completion

The research community has shown increasing interest in designing intell...
research
08/22/2023

Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation

This report details the methods of the winning entry of the AVDN Challen...
research
05/27/2021

Maria: A Visual Experience Powered Conversational Agent

Arguably, the visual perception of conversational agents to the physical...

Please sign up or login with your details

Forgot password? Click here to reset