Lana: A Language-Capable Navigator for Instruction Following and Generation

03/15/2023
by   Xiaohan Wang, et al.
0

Recently, visual-language navigation (VLN) – entailing robot agents to follow navigation instructions – has shown great advance. However, existing literature put most emphasis on interpreting instructions into actions, only delivering "dumb" wayfinding agents. In this article, we devise LANA, a language-capable navigation agent which is able to not only execute human-written navigation commands, but also provide route descriptions to humans. This is achieved by simultaneously learning instruction following and generation with only one single model. More specifically, two encoders, respectively for route and language encoding, are built and shared by two decoders, respectively, for action prediction and instruction generation, so as to exploit cross-task knowledge and capture task-specific characteristics. Throughout pretraining and fine-tuning, both instruction following and generation are set as optimization objectives. We empirically verify that, compared with recent advanced task-specific solutions, LANA attains better performances on both instruction following and route description, with nearly half complexity. In addition, endowed with language generation capability, LANA can explain to humans its behaviors and assist human's wayfinding. This work is expected to foster future efforts towards building more trustworthy and socially-intelligent navigation robots.

READ FULL TEXT

page 1

page 3

page 8

page 10

page 11

research
03/30/2022

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Since the rise of vision-language navigation (VLN), great progress has b...
research
08/26/2021

Visual-and-Language Navigation: A Survey and Taxonomy

An agent that can understand natural-language instruction and carry out ...
research
05/31/2023

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Much of the previous work towards digital agents for graphical user inte...
research
03/14/2023

Vision-based route following by an embodied insect-inspired sparse neural network

We compared the efficiency of the FlyHash model, an insect-inspired spar...
research
11/14/2017

Unified Pragmatic Models for Generating and Following Instructions

We extend models for both following and generating natural language inst...
research
04/30/2020

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Following a navigation instruction such as 'Walk down the stairs and sto...
research
10/05/2021

Waypoint Models for Instruction-guided Navigation in Continuous Environments

Little inquiry has explicitly addressed the role of action spaces in lan...

Please sign up or login with your details

Forgot password? Click here to reset