KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

03/28/2023
by   Xiangyang Li, et al.
0

Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. However, these representations are not efficient enough for an agent to perform actions to arrive the target location. As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability. Specifically, we first retrieve facts (i.e., knowledge described by language descriptions) for the navigation views based on local regions from the constructed knowledge base. The retrieved facts range from properties of a single object (e.g., color, shape) to relationships between objects (e.g., action, spatial position), providing crucial information for VLN. We further present the KERM which contains the purification, fact-aware interaction, and instruction-guided aggregation modules to integrate visual, history, instruction, and fact features. The proposed KERM can automatically select and gather crucial and relevant cues, obtaining more accurate action prediction. Experimental results on the REVERIE, R2R, and SOON datasets demonstrate the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 11

research
04/06/2020

Sub-Instruction Aware Vision-and-Language Navigation

Vision-and-language navigation requires an agent to navigate through a r...
research
07/29/2020

Object-and-Action Aware Model for Visual Language Navigation

Vision-and-Language Navigation (VLN) is unique in that it requires turni...
research
08/15/2023

A^2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

We study the task of zero-shot vision-and-language navigation (ZS-VLN), ...
research
04/20/2022

Reinforced Structured State-Evolution for Vision-Language Navigation

Vision-and-language Navigation (VLN) task requires an embodied agent to ...
research
07/15/2021

Neighbor-view Enhanced Model for Vision and Language Navigation

Vision and Language Navigation (VLN) requires an agent to navigate to a ...
research
05/05/2023

A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a realistic but challenging task...
research
03/22/2022

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

Pre-training has been adopted in a few of recent works for Vision-and-La...

Please sign up or login with your details

Forgot password? Click here to reset