-
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Advances in learning and representations have reinvigorated work that co...
read it
-
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Following a navigation instruction such as 'Walk down the stairs and sto...
read it
-
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following...
read it
-
Object-and-Action Aware Model for Visual Language Navigation
Vision-and-Language Navigation (VLN) is unique in that it requires turni...
read it
-
Effective and General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping
In instruction conditioned navigation, agents interpret natural language...
read it
-
CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant
We propose a large scale semantic parsing dataset focused on instruction...
read it
-
Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
We consider the problem of learning to map from natural language instruc...
read it
Sub-Instruction Aware Vision-and-Language Navigation
Vision-and-language navigation requires an agent to navigate through a real 3D environment following a given natural language instruction. Despite significant advances, few previous works are able to fully utilize the strong correspondence between the visual and textual sequences. Meanwhile, due to the lack of intermediate supervision, the agent's performance at following each part of the instruction remains untrackable during navigation. In this work, we focus on the granularity of the visual and language sequences as well as the trackability of agents through the completion of instruction. We provide agents with fine-grained annotations during training and find that they are able to follow the instruction better and have a higher chance of reaching the target at test time. We enrich the previous dataset with sub-instructions and their corresponding paths. To make use of this data, we propose an effective sub-instruction attention and shifting modules that attend and select a single sub-instruction at each time-step. We implement our sub-instruction modules in four state-of-the-art agents, compare with their baseline model, and show that our proposed method improves the performance of all four agents.
READ FULL TEXT
Comments
There are no comments yet.