VLN-Trans: Translator for the Vision and Language Navigation Agent

02/18/2023
by   Yue Zhang, et al.
0

Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints. To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent's visual abilities and the observed visual environment. To achieve this goal, we create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent. We evaluate our approach on Room2Room (R2R), Room4room (R4R), and Room2Room Last (R2R-Last) datasets and achieve state-of-the-art results on multiple benchmarks.

READ FULL TEXT

page 1

page 9

page 12

page 13

research
09/26/2022

LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation

Understanding spatial and visual information is essential for a navigati...
research
05/10/2020

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

Learning to follow instructions is of fundamental importance to autonomo...
research
02/14/2022

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

We study the problem of developing autonomous agents that can follow hum...
research
08/24/2019

Situational Fusion of Visual Representation for Visual Navigation

A complex visual navigation task puts an agent in different situations w...
research
09/04/2018

Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

We propose to decompose instruction execution to goal prediction and act...
research
03/02/2023

MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) aims to develop intelligent agents ...
research
07/12/2017

Source-Target Inference Models for Spatial Instruction Understanding

Models that can execute natural language instructions for situated robot...

Please sign up or login with your details

Forgot password? Click here to reset