Masked Path Modeling for Vision-and-Language Navigation

05/23/2023
by   Zi-Yi Dou, et al.
0

Vision-and-language navigation (VLN) agents are trained to navigate in real-world environments by following natural language instructions. A major challenge in VLN is the limited availability of training data, which hinders the models' ability to generalize effectively. Previous approaches have attempted to address this issue by introducing additional supervision during training, often requiring costly human-annotated data that restricts scalability. In this paper, we introduce a masked path modeling (MPM) objective, which pretrains an agent using self-collected data for downstream navigation tasks. Our proposed method involves allowing the agent to actively explore navigation environments without a specific goal and collect the paths it traverses. Subsequently, we train the agent on this collected data to reconstruct the original path given a randomly masked subpath. This way, the agent can actively accumulate a diverse and substantial amount of data while learning conditional action generation. To evaluate the effectiveness of our technique, we conduct experiments on various VLN datasets and demonstrate the versatility of MPM across different levels of instruction complexity. Our results exhibit significant improvements in success rates, with enhancements of 1.32%, 1.05%, and 1.19% on the val-unseen split of the Room-to-Room, Room-for-Room, and Room-across-Room datasets, respectively. Furthermore, we conduct an analysis that highlights the potential for additional improvements when the agent is allowed to explore unseen environments prior to testing.

READ FULL TEXT

page 1

page 3

page 7

research
03/29/2022

EnvEdit: Environment Editing for Vision-and-Language Navigation

In Vision-and-Language Navigation (VLN), an agent needs to navigate thro...
research
11/17/2019

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling

Vision-and-Language Navigation (VLN) is a task where agents must decide ...
research
08/20/2021

Airbert: In-domain Pretraining for Vision-and-Language Navigation

Vision-and-language navigation (VLN) aims to enable embodied agents to n...
research
11/22/2020

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learni...
research
04/08/2019

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

A grand goal in AI is to build a robot that can accurately navigate base...
research
03/31/2020

Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation

In the Vision-and-Language Navigation (VLN) task, an agent with egocentr...
research
03/06/2019

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

We present FAST NAVIGATOR, a general framework for action decoding, whic...

Please sign up or login with your details

Forgot password? Click here to reset