Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

01/10/2019
by   Chih-Yao Ma, et al.
2

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. We test our self-monitoring agent on a standard benchmark and analyze our proposed approach through a series of ablation studies that elucidate the contributions of the primary components. Using our proposed method, we set the new state of the art by a significant margin (8 available at https://github.com/chihyaoma/selfmonitoring-agent .

READ FULL TEXT

page 2

page 8

page 16

page 17

page 18

research
04/19/2021

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

Vision language navigation is the task that requires an agent to navigat...
research
03/05/2019

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

As deep learning continues to make progress for challenging perception t...
research
12/06/2020

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

Performing simple household tasks based on language directives is very n...
research
05/05/2023

A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a realistic but challenging task...
research
05/19/2023

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

The progress of autonomous web navigation has been hindered by the depen...
research
11/22/2020

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learni...
research
09/17/2021

Realistic PointGoal Navigation via Auxiliary Losses and Information Bottleneck

We propose a novel architecture and training paradigm for training reali...

Please sign up or login with your details

Forgot password? Click here to reset