Vision-Language Navigation with Random Environmental Mixup

06/15/2021
by   Chong Liu, et al.
0

Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction. Large data bias, which is caused by the disparity ratio between the small data scale and large navigation space, makes the VLN task challenging. Previous works have proposed various data augmentation methods to reduce data bias. However, these works do not explicitly reduce the data bias across different house scenes. Therefore, the agent would overfit to the seen scenes and achieve poor navigation performance in the unseen scenes. To tackle this problem, we propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment. Specifically, we first select key viewpoints according to the room connection graph for each scene. Then, we cross-connect the key views of different scenes to construct augmented scenes. Finally, we generate augmented instruction-path pairs in the cross-connected scenes. The experimental results on benchmark datasets demonstrate that our augmentation data via REM help the agent reduce its performance gap between the seen and unseen environment and improve the overall performance, making our model the best existing approach on the standard VLN benchmark.

READ FULL TEXT

page 1

page 4

page 8

research
05/31/2019

Multi-modal Discriminative Model for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding tas...
research
04/19/2021

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

Vision language navigation is the task that requires an agent to navigat...
research
03/08/2022

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Vision-language navigation (VLN) is a challenging task due to its large ...
research
11/29/2018

Incremental Scene Synthesis

We present a method to incrementally generate complete 2D or 3D scenes w...
research
03/30/2022

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Since the rise of vision-language navigation (VLN), great progress has b...
research
09/10/2022

Anticipating the Unseen Discrepancy for Vision and Language Navigation

Vision-Language Navigation requires the agent to follow natural language...
research
05/12/2019

Improving Natural Language Interaction with Robots Using Advice

Over the last few years, there has been growing interest in learning mod...

Please sign up or login with your details

Forgot password? Click here to reset