Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

10/15/2020
by   Alexander Ku, et al.
2

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations. We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human demonstrations. The size, scope and detail of RxR dramatically expands the frontier for research on embodied language agents in simulated, photo-realistic environments.

READ FULL TEXT

page 1

page 4

page 6

page 13

page 14

page 15

page 20

page 21

research
11/25/2021

Less is More: Generating Grounded Navigation Instructions from Landmarks

We study the automatic generation of navigation instructions from 360-de...
research
07/05/2022

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Vision-and-Language Navigation (VLN) tasks require an agent to navigate ...
research
05/29/2019

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Advances in learning and representations have reinvigorated work that co...
research
07/11/2019

Effective and General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping

In instruction conditioned navigation, agents interpret natural language...
research
03/23/2021

PanGEA: The Panoramic Graph Environment Annotation Toolkit

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightwe...
research
07/20/2020

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

We introduce a learning-based approach for room navigation using semanti...
research
06/09/2022

FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation

The speaker-follower models have proven to be effective in vision-and-la...

Please sign up or login with your details

Forgot password? Click here to reset