A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues

07/24/2022
by   Jason Armitage, et al.
35

In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based architectures without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning - with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core challenges of cross-modal alignment and feature-level localisation. The priority map module is integrated into a feature-location framework that doubles the task completion rates of standalone transformers and attains state-of-the-art performance on the Touchdown benchmark for VLN. Code and data are referenced in Appendix C.

READ FULL TEXT
research
10/27/2021

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Natural language instructions for visual navigation often use scene desc...
research
09/07/2023

Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation

With strong representation capabilities, pretrained vision-language mode...
research
03/10/2022

Cross-modal Map Learning for Vision and Language Navigation

We consider the problem of Vision-and-Language Navigation (VLN). The maj...
research
02/13/2023

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

Vision-Language Navigation (VLN) is a challenging task which requires an...
research
11/18/2019

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Vision-Language Navigation (VLN) is a task where agents learn to navigat...
research
05/26/2023

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

Most existing works solving Room-to-Room VLN problem only utilize RGB im...

Please sign up or login with your details

Forgot password? Click here to reset