Explore the Potential Performance of Vision-and-Language Navigation Model: a Snapshot Ensemble Method

11/28/2021
by   Wenda Qin, et al.
0

Vision-and-Language Navigation (VLN) is a challenging task in the field of artificial intelligence. Although massive progress has been made in this task over the past few years attributed to breakthroughs in deep vision and language models, it remains tough to build VLN models that can generalize as well as humans. In this paper, we provide a new perspective to improve VLN models. Based on our discovery that snapshots of the same VLN model behave significantly differently even when their success rates are relatively the same, we propose a snapshot-based ensemble solution that leverages predictions among multiple snapshots. Constructed on the snapshots of the existing state-of-the-art (SOTA) model ↻BERT and our past-action-aware modification, our proposed ensemble achieves the new SOTA performance in the R2R dataset challenge in Navigation Error (NE) and Success weighted by Path Length (SPL).

READ FULL TEXT

page 12

page 13

research
11/26/2020

A Recurrent Vision-and-Language BERT for Navigation

Accuracy of many visiolinguistic tasks has benefited significantly from ...
research
03/05/2019

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

As deep learning continues to make progress for challenging perception t...
research
09/28/2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding tas...
research
11/30/2022

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

Household environments are visually diverse. Embodied agents performing ...
research
06/23/2015

A Survey of Current Datasets for Vision and Language Research

Integrating vision and language has long been a dream in work on artific...
research
03/06/2019

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

We present FAST NAVIGATOR, a general framework for action decoding, whic...
research
10/12/2021

Rethinking the Spatial Route Prior in Vision-and-Language Navigation

Vision-and-language navigation (VLN) is a trending topic which aims to n...

Please sign up or login with your details

Forgot password? Click here to reset