Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

07/01/2020
by   Wanrong Zhu, et al.
0

In the vision-and-language navigation (VLN) task, an agent follows natural language instructions and navigate in visual environments. Compared to the indoor navigation task that has been broadly studied, navigation in real-life outdoor environments remains a significant challenge with its complicated visual inputs and an insufficient amount of instructions that illustrate the intricate urban scenes. In this paper, we introduce a Multimodal Text Style Transfer (MTST) learning approach to mitigate the problem of data scarcity in outdoor navigation tasks by effectively leveraging external multimodal resources. We first enrich the navigation data by transferring the style of the instructions generated by Google Maps API, then pre-train the navigator with the augmented external outdoor navigation dataset. Experimental results show that our MTST learning approach is model-agnostic, and our MTST approach significantly outperforms the baseline models on the outdoor VLN task, improving task completion rate by 22% relatively on the test set and achieving new state-of-the-art performance.

READ FULL TEXT
research
03/30/2021

Diagnosing Vision-and-Language Navigation: What Really Matters

Vision-and-language navigation (VLN) is a multimodal task where an agent...
research
09/28/2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding tas...
research
09/18/2023

Reasoning about the Unseen for Efficient Outdoor Object Navigation

Robots should exist anywhere humans do: indoors, outdoors, and even unma...
research
11/29/2018

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

We study the problem of jointly reasoning about language and vision thro...
research
03/25/2022

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Vision and language navigation (VLN) is a challenging visually-grounded ...
research
09/19/2019

RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation

Following navigation instructions in natural language requires a composi...
research
09/19/2023

Guide Your Agent with Adaptive Multimodal Rewards

Developing an agent capable of adapting to unseen environments remains a...

Please sign up or login with your details

Forgot password? Click here to reset