CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

03/09/2022
by   Huayao Liu, et al.
9

The performance of semantic segmentation of RGB images can be advanced by exploiting informative features from supplementary modalities. In this work, we propose CMX, a vision-transformer-based cross-modal fusion framework for RGB-X semantic segmentation. To generalize to different sensing modalities encompassing various uncertainties, we consider that comprehensive cross-modal interactions should be provided. CMX is built with two streams to extract features from RGB images and the complementary modality (X-modality). In each feature extraction stage, we design a Cross-Modal Feature Rectification Module (CM-FRM) to calibrate the feature of the current modality by combining the feature from the other modality, in spatial- and channel-wise dimensions. With rectified feature pairs, we deploy a Feature Fusion Module (FFM) to mix them for the final semantic prediction. FFM is constructed with a cross-attention mechanism, which enables exchange of long-range contexts, enhancing both modalities' features at a global level. Extensive experiments show that CMX generalizes to diverse multi-modal combinations, achieving state-of-the-art performances on four RGB-Depth benchmarks, as well as RGB-Thermal and RGB-Polarization datasets. Besides, to investigate the generalizability to dense-sparse data fusion, we establish a RGB-Event semantic segmentation benchmark based on the EventScape dataset, on which CMX sets the new state-of-the-art. Code is available at https://github.com/huaaaliu/RGBX_Semantic_Segmentation

READ FULL TEXT

page 1

page 4

page 12

page 13

page 14

page 15

research
08/18/2023

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

In recent years, the research community has shown a lot of interest to p...
research
03/30/2023

Complementary Random Masking for RGB-Thermal Semantic Segmentation

RGB-thermal semantic segmentation is one potential solution to achieve r...
research
10/26/2022

RGB-T Semantic Segmentation with Location, Activation, and Sharpening

Semantic segmentation is important for scene understanding. To address t...
research
03/02/2023

Delivering Arbitrary-Modal Semantic Segmentation

Multimodal fusion can make semantic segmentation more robust. However, f...
research
01/26/2021

Global-Local Propagation Network for RGB-D Semantic Segmentation

Depth information matters in RGB-D semantic segmentation task for provid...
research
06/29/2019

RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation

Signals from RGB and depth data carry complementary information about th...
research
07/16/2023

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

We propose a novel approach for RGB-D salient instance segmentation usin...

Please sign up or login with your details

Forgot password? Click here to reset