Speaker-Follower Models for Vision-and-Language Navigation

06/07/2018
by   Daniel Fried, et al.
0

Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify only a few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this presents a double challenge: it is difficult to collect enough annotated data to enable learning of this reasoning process from scratch, and empirically difficult to implement the reasoning process using generic sequence models. Here we describe an approach to vision-and-language navigation that addresses both these issues with an embedded speaker model. We use this speaker model to synthesize new instructions for data augmentation and to implement pragmatic reasoning for evaluating candidate action sequences. Both steps are supported by a panoramic action space that reflects the granularity of human-generated instructions. Experiments show that all three pieces of this approach---speaker-driven data augmentation, pragmatic reasoning and panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.

READ FULL TEXT

page 5

page 8

page 15

page 16

page 17

page 18

page 19

page 20

research
11/14/2017

Unified Pragmatic Models for Generating and Following Instructions

We extend models for both following and generating natural language inst...
research
06/09/2022

FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation

The speaker-follower models have proven to be effective in vision-and-la...
research
03/01/2021

CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation

Navigation guided by natural language instructions is particularly suita...
research
03/30/2022

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Since the rise of vision-language navigation (VLN), great progress has b...
research
10/27/2022

Bridging the visual gap in VLN via semantically richer instructions

The Visual-and-Language Navigation (VLN) task requires understanding a t...
research
04/11/2022

Linguistic communication as (inverse) reward design

Natural language is an intuitive and expressive way to communicate rewar...
research
05/19/2023

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation

Vision-and-language navigation (VLN) is a crucial but challenging cross-...

Please sign up or login with your details

Forgot password? Click here to reset