Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

07/13/2022
by   Chang Chen, et al.
5

Humans have an innate ability to sense their surroundings, as they can extract the spatial representation from the egocentric perception and form an allocentric semantic map via spatial transformation and memory updating. However, endowing mobile agents with such a spatial sensing ability is still a challenge, due to two difficulties: (1) the previous convolutional models are limited by the local receptive field, thus, struggling to capture holistic long-range dependencies during observation; (2) the excessive computational budgets required for success, often lead to a separation of the mapping pipeline into stages, resulting the entire mapping process inefficient. To address these issues, we propose an end-to-end one-stage Transformer-based framework for Mapping, termed Trans4Map. Our egocentric-to-allocentric mapping process includes three steps: (1) the efficient transformer extracts the contextual features from a batch of egocentric images; (2) the proposed Bidirectional Allocentric Memory (BAM) module projects egocentric features into the allocentric memory; (3) the map decoder parses the accumulated memory and predicts the top-down semantic segmentation map. In contrast, Trans4Map achieves state-of-the-art results, reducing 67.2 +3.25 will be made publicly available at https://github.com/jamycheung/Trans4Map.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
11/21/2022

ElegantSeg: End-to-End Holistic Learning for Extra-Large Image Semantic Segmentation

This paper presents a new paradigm for Extra-large image semantic Segmen...
research
11/02/2022

WITT: A Wireless Image Transmission Transformer for Semantic Communications

In this paper, we aim to redesign the vision Transformer (ViT) as a new ...
research
02/15/2021

End-to-End Egospheric Spatial Memory

Spatial memory, or the ability to remember and recall specific locations...
research
09/04/2023

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

Transformers have shown superior performance on various vision tasks. Th...
research
08/25/2022

Adaptive Perception Transformer for Temporal Action Localization

Temporal action localization aims to predict the boundary and category o...
research
05/29/2023

Streaming Audio Transformers for Online Audio Tagging

Transformers have emerged as a prominent model framework for audio taggi...
research
10/03/2021

Translating Images into Maps

We approach instantaneous mapping, converting images to a top-down view ...

Please sign up or login with your details

Forgot password? Click here to reset