DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-Based Segmentation Networks

11/08/2022
by   Francesco Barbato, et al.
0

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. Effectively, we extend the network to multimodal data without adding any parameters and in a natural way that makes use of the strength of transformers' self-attention modules. We also investigate the idea of performing cross-modality operations inside the attention module, swapping the key inputs between the depth and color branches. Our approach consistently improves performances on the Cityscapes benchmark.

READ FULL TEXT

page 2

page 3

research
05/23/2023

Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

With the increasing availability of depth sensors, multimodal frameworks...
research
03/26/2023

RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning

Existing Transformer-based RGBT tracking methods either use cross-attent...
research
05/24/2019

ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation

Compared to RGB semantic segmentation, RGBD semantic segmentation can ac...
research
12/24/2021

Realtime Global Attention Network for Semantic Segmentation

In this paper, we proposed an end-to-end realtime global attention neura...
research
02/24/2022

Attention Enables Zero Approximation Error

Deep learning models have been widely applied in various aspects of dail...
research
05/19/2019

Adaptive Attention Span in Transformers

We propose a novel self-attention mechanism that can learn its optimal a...
research
03/27/2020

Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling

In this study, we focus on the egocentric segmentation of arms to improv...

Please sign up or login with your details

Forgot password? Click here to reset