Diagnosing the Environment Bias in Vision-and-Language Navigation

05/06/2020
by   Yubo Zhang, et al.
4

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. These step-by-step navigational instructions are crucial when the agent is navigating new environments about which it has no prior knowledge. Most recent works that study VLN observe a significant performance drop when tested on unseen environments (i.e., environments not used in training), indicating that the neural agent models are highly biased towards training environments. Although this issue is considered as one of the major challenges in VLN research, it is still under-studied and needs a clearer explanation. In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias. We observe that neither the language nor the underlying navigational graph, but the low-level visual appearance conveyed by ResNet features directly affects the agent model and contributes to this environment bias in results. According to this observation, we explore several kinds of semantic representations that contain less low-level visual information, hence the agent learned with these features could be better generalized to unseen testing environments. Without modifying the baseline agent model and its training method, our explored semantic features significantly decrease the performance gaps between seen and unseen on multiple datasets (i.e. R2R, R4R, and CVDN) and achieve competitive unseen results to previous state-of-the-art models. Our code and features are available at: https://github.com/zhangybzbo/EnvBiasVLN

READ FULL TEXT
research
03/01/2020

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Recent research efforts enable study for natural language grounded navig...
research
03/29/2022

EnvEdit: Environment Editing for Vision-and-Language Navigation

In Vision-and-Language Navigation (VLN), an agent needs to navigate thro...
research
11/07/2020

Sim-to-Real Transfer for Vision-and-Language Navigation

We study the challenging problem of releasing a robot in a previously un...
research
09/10/2022

Anticipating the Unseen Discrepancy for Vision and Language Navigation

Vision-Language Navigation requires the agent to follow natural language...
research
06/06/2023

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Large language models (LLMs) encode a vast amount of world knowledge acq...
research
11/27/2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a challenging task in which an a...
research
10/12/2021

Rethinking the Spatial Route Prior in Vision-and-Language Navigation

Vision-and-language navigation (VLN) is a trending topic which aims to n...

Please sign up or login with your details

Forgot password? Click here to reset