Reinforced Structured State-Evolution for Vision-Language Navigation

04/20/2022
by   Jinyu Chen, et al.
0

Vision-and-language Navigation (VLN) task requires an embodied agent to navigate to a remote location following a natural language instruction. Previous methods usually adopt a sequence model (e.g., Transformer and LSTM) as the navigator. In such a paradigm, the sequence model predicts action at each step through a maintained navigation state, which is generally represented as a one-dimensional vector. However, the crucial navigation clues (i.e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured. In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. Specifically, we utilise the graph-based feature to represent the navigation state instead of the vector-based state. Accordingly, we devise a Reinforced Layout clues Miner (RLM) to mine and detect the most crucial layout graph for long-term navigation via a customised reinforcement learning strategy. Moreover, the Structured Evolving Module (SEM) is proposed to maintain the structured graph-based state during navigation, where the state is gradually evolved to learn the object-level spatial-temporal relationship. The experiments on the R2R and R4R datasets show that the proposed SEvol model improves VLN models' performance by large margins, e.g., +3 the R2R test set.

READ FULL TEXT
research
07/19/2022

Target-Driven Structured Transformer Planner for Vision-Language Navigation

Vision-language navigation is the task of directing an embodied agent to...
research
11/10/2021

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task that an agent is required...
research
03/31/2021

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

The ability to navigate like a human towards a language-guided target fr...
research
03/28/2023

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Vision-and-language navigation (VLN) is the task to enable an embodied a...
research
01/11/2023

Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a ...
research
07/11/2020

Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

The ability to perform effective planning is crucial for building an ins...
research
04/06/2020

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

We develop a language-guided navigation task set in a continuous 3D envi...

Please sign up or login with your details

Forgot password? Click here to reset