DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

08/14/2023
by   Hanqing Wang, et al.
0

VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose DREAMWALKER – a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. DREAMWALKER can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, DREAMWALKER is able to make strategic planning through large amounts of “mental experiments.” Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.

READ FULL TEXT

page 1

page 4

page 8

page 11

page 12

page 13

page 14

page 15

research
06/07/2023

Dual policy as self-model for planning

Planning is a data efficient decision-making strategy where an agent sel...
research
03/05/2021

Structured Scene Memory for Vision-Language Navigation

Recently, numerous algorithms have been developed to tackle the problem ...
research
07/19/2017

Learning model-based planning from scratch

Conventional wisdom holds that model-based planning is a powerful approa...
research
01/15/2014

Networks of Influence Diagrams: A Formalism for Representing Agents' Beliefs and Decision-Making Processes

This paper presents Networks of Influence Diagrams (NID), a compact, nat...
research
05/23/2017

Visual Semantic Planning using Deep Successor Representations

A crucial capability of real-world intelligent agents is their ability t...
research
05/25/2020

Learning to Simulate Dynamic Environments with GameGAN

Simulation is a crucial component of any robotic system. In order to sim...
research
11/05/2018

Learning Shared Dynamics with Meta-World Models

Humans have consciousness as the ability to perceive events and objects:...

Please sign up or login with your details

Forgot password? Click here to reset