VTNet: Visual Transformer Network for Object Goal Navigation

05/20/2021
by   Heming Du, et al.
12

Object goal navigation aims to steer an agent towards a target object based on observations of the agent. It is of pivotal importance to design effective visual representations of the observed scene in determining navigation actions. In this paper, we introduce a Visual Transformer Network (VTNet) for learning informative visual representation in navigation. VTNet is a highly effective structure that embodies two key properties for visual representations: First, the relationships among all the object instances in a scene are exploited; Second, the spatial locations of objects and image regions are emphasized so that directional navigation signals can be learned. Furthermore, we also develop a pre-training scheme to associate the visual representations with navigation signals, and thus facilitate navigation policy learning. In a nutshell, VTNet embeds object and region features with their location cues as spatial-aware descriptors and then incorporates all the encoded descriptors through attention operations to achieve informative representation for navigation. Given such visual representations, agents are able to explore the correlations between visual observations and navigation actions. For example, an agent would prioritize "turning right" over "turning left" when the visual representation emphasizes on the right side of activation map. Experiments in the artificial environment AI2-Thor demonstrate that VTNet significantly outperforms state-of-the-art methods in unseen testing environments.

READ FULL TEXT

page 2

page 4

page 13

page 14

page 16

research
11/29/2021

Agent-Centric Relation Graph for Object Visual Navigation

Object visual navigation aims to steer an agent towards a target object ...
research
07/23/2023

Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the en...
research
08/10/2023

Object Goal Navigation with Recursive Implicit Maps

Object goal navigation aims to navigate an agent to locations of a given...
research
07/21/2020

Learning Object Relation Graph and Tentative Policy for Visual Navigation

Target-driven visual navigation aims at navigating an agent towards a gi...
research
11/22/2018

Object-oriented Targets for Visual Navigation using Rich Semantic Representations

When searching for an object humans navigate through a scene using seman...
research
04/28/2021

Pushing it out of the Way: Interactive Visual Navigation

We have observed significant progress in visual navigation for embodied ...
research
09/15/2023

Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation

The task of Visual Object Navigation (VON) involves an agent's ability t...

Please sign up or login with your details

Forgot password? Click here to reset