Layout-aware Dreamer for Embodied Referring Expression Grounding

11/30/2022
by   Mingxiao Li, et al.
0

In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02 compared to the previous state-of-the-art. The code is released at https://github.com/zehao-wang/LAD

READ FULL TEXT

page 1

page 5

page 7

research
07/22/2023

Learning Vision-and-Language Navigation from YouTube Videos

Vision-and-language navigation (VLN) requires an embodied agent to navig...
research
07/19/2022

Target-Driven Structured Transformer Planner for Vision-Language Navigation

Vision-language navigation is the task of directing an embodied agent to...
research
04/04/2022

A Machine With Human-Like Memory Systems

Inspired by the cognitive science theory, we explicitly model an agent w...
research
11/23/2022

Predicting Topological Maps for Visual Navigation in Unexplored Environments

We propose a robotic learning system for autonomous exploration and navi...
research
08/29/2023

iBARLE: imBalance-Aware Room Layout Estimation

Room layout estimation predicts layouts from a single panorama. It requi...
research
03/24/2021

Scene-Intuitive Agent for Remote Embodied Visual Grounding

Humans learn from life events to form intuitions towards the understandi...
research
09/10/2019

Bayesian Relational Memory for Semantic Visual Navigation

We introduce a new memory architecture, Bayesian Relational Memory (BRM)...

Please sign up or login with your details

Forgot password? Click here to reset