Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

09/28/2020
by   Jiannan Xiang, et al.
0

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments. A key challenge is to recognize and stop at the correct location, especially for complicated outdoor environments. Existing methods treat the STOP action equally as other actions, which results in undesirable behaviors that the agent often fails to stop at the destination even though it might be on the right path. Therefore, we propose Learning to Stop (L2Stop), a simple yet effective policy module that differentiates STOP and other actions. Our approach achieves the new state of the art on a challenging urban VLN dataset Touchdown, outperforming the baseline by 6.89 (absolute improvement) on Success weighted by Edit Distance (SED).

READ FULL TEXT

page 1

page 5

research
09/30/2021

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

In the Vision-and-Language Navigation (VLN) task an embodied agent navig...
research
07/01/2020

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

In the vision-and-language navigation (VLN) task, an agent follows natur...
research
03/07/2023

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

The main challenge in vision-and-language navigation (VLN) is how to und...
research
09/05/2019

Robust Navigation with Language Pretraining and Stochastic Sampling

Core to the vision-and-language navigation (VLN) challenge is building r...
research
11/28/2021

Explore the Potential Performance of Vision-and-Language Navigation Model: a Snapshot Ensemble Method

Vision-and-Language Navigation (VLN) is a challenging task in the field ...
research
09/19/2019

RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation

Following navigation instructions in natural language requires a composi...
research
04/08/2019

Revisiting EmbodiedQA: A Simple Baseline and Beyond

In Embodied Question Answering (EmbodiedQA), an agent interacts with an ...

Please sign up or login with your details

Forgot password? Click here to reset