Cross-Lingual Vision-Language Navigation

10/24/2019
by   An Yan, et al.
0

Vision-Language Navigation (VLN) is the task where an agent is commanded to navigate in photo-realistic environments with natural language instructions. Previous research on VLN is primarily conducted on the Room-to-Room (R2R) dataset with only English instructions. The ultimate goal of VLN, however, is to serve people speaking arbitrary languages. To do this, we collect a cross-lingual R2R dataset, extending the original benchmark with corresponding Chinese instructions. But it is impractical to collect human-annotated instructions for every existing language. Based on the newly introduced dataset, we propose a general cross-lingual VLN framework to enable instruction-following navigation for different languages. We first explore the possibility of building a cross-lingual agent when no training data of the target language is available. The cross-lingual agent is equipped with a meta-learner to aggregate cross-lingual representations and with a visually grounded cross-lingual alignment module to align textual representations of different languages. Under the zero-shot learning scenario, our model shows competitive results even compared to a model trained with all target language instructions. Besides, we introduce an adversarial domain adaption loss to improve the transferring ability of our model when given a certain amount of target language data. Our dataset and methods demonstrate potentials of building scalable cross-lingual agents to serve speakers with different languages.

READ FULL TEXT

page 8

page 13

research
06/03/2021

Language Embeddings for Typology and Cross-lingual Transfer Learning

Cross-lingual language tasks typically require a substantial amount of a...
research
07/05/2022

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Vision-and-Language Navigation (VLN) tasks require an agent to navigate ...
research
05/08/2023

Accessible Instruction-Following Agent

Humans can collaborate and complete tasks based on visual signals and in...
research
09/30/2020

Cross-lingual Spoken Language Understanding with Regularized Representation Alignment

Despite the promising results of current cross-lingual models for spoken...
research
05/23/2023

Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction

Instruction-tuned large language models (LLMs) have shown remarkable gen...
research
05/31/2023

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models

Large Language Models (LLMs) have made remarkable advancements in the fi...
research
01/29/2022

Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Machine learning models allow us to compare languages by showing how har...

Please sign up or login with your details

Forgot password? Click here to reset