Self-supervised 3D Semantic Representation Learning for Vision-and-Language Navigation

01/26/2022
by   Sinan Tan, et al.
0

In the Vision-and-Language Navigation task, the embodied agent follows linguistic instructions and navigates to a specific goal. It is important in many practical scenarios and has attracted extensive attention from both computer vision and robotics communities. However, most existing works only use RGB images but neglect the 3D semantic information of the scene. To this end, we develop a novel self-supervised training framework to encode the voxel-level 3D semantic reconstruction into a 3D semantic representation. Specifically, a region query task is designed as the pretext task, which predicts the presence or absence of objects of a particular class in a specific 3D region. Then, we construct an LSTM-based navigation model and train it with the proposed 3D semantic representations and BERT language features on vision-language pairs. Experiments show that the proposed approach achieves success rates of 68 66 respectively, which are superior to most of RGB-based methods utilizing vision-language transformers.

READ FULL TEXT

page 2

page 6

research
11/18/2019

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Vision-Language Navigation (VLN) is a task where agents learn to navigat...
research
07/23/2023

Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the en...
research
05/26/2023

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

Most existing works solving Room-to-Room VLN problem only utilize RGB im...
research
03/21/2021

MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation

Visual navigation for autonomous agents is a core task in the fields of ...
research
05/10/2023

Active Semantic Localization with Graph Neural Embedding

Semantic localization, i.e., robot self-localization with semantic image...
research
05/21/2023

Instance-Level Semantic Maps for Vision Language Navigation

Humans have a natural ability to perform semantic associations with the ...
research
05/30/2022

GMML is All you Need

Vision transformers have generated significant interest in the computer ...

Please sign up or login with your details

Forgot password? Click here to reset