Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

07/17/2020
by   Xiaokang Chen, et al.
0

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets. Code of this work is available at https://charlescxk.github.io/

READ FULL TEXT

page 2

page 8

page 13

page 22

page 23

page 24

page 25

page 26

research
12/17/2018

Learning Common Representation from RGB and Depth Images

We propose a new deep learning architecture for the tasks of semantic se...
research
12/04/2021

BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

Thermal infrared (TIR) image has proven effectiveness in providing tempe...
research
08/18/2021

Specificity-preserving RGB-D Saliency Detection

RGB-D saliency detection has attracted increasing attention, due to its ...
research
08/18/2023

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

In recent years, the research community has shown a lot of interest to p...
research
08/03/2016

Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks

In this paper, we tackle the problem of RGB-D semantic segmentation of i...
research
07/03/2022

You Only Need One Detector: Unified Object Detector for Different Modalities based on Vision Transformers

Most systems use different models for different modalities, such as one ...
research
03/02/2023

Delivering Arbitrary-Modal Semantic Segmentation

Multimodal fusion can make semantic segmentation more robust. However, f...

Please sign up or login with your details

Forgot password? Click here to reset