Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection

06/07/2022
by   Chao Zeng, et al.
0

Salient Object Detection is the task of predicting the human attended region in a given scene. Fusing depth information has been proven effective in this task. The main challenge of this problem is how to aggregate the complementary information from RGB modality and depth modality. However, conventional deep models heavily rely on CNN feature extractors, and the long-range contextual dependencies are usually ignored. In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Before fusing the two branches of features into one, attention-based modules are applied to enhance features from each modality. We design a self-attention-based cross-modality interaction module and a gated modality attention module to leverage the complementary information between the two modalities. For the saliency decoding, we create different stages enhanced with dense connections and keep a decoding memory while the multi-level encoding features are considered simultaneously. Considering the inaccurate depth map issue, we collect the RGB features of early stages into a skip convolution module to give more guidance from RGB modality to the final saliency prediction. In addition, we add edge supervision to regularize the feature learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed DTMINet method.

READ FULL TEXT

page 1

page 4

page 5

page 9

page 10

page 14

research
12/02/2021

MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object Detection

Salient object detection (SOD) on RGB-D images is an active problem in c...
research
10/12/2020

Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

How to effectively fuse cross-modal information is the key problem for R...
research
12/21/2019

cmSalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks

Image salient object detection (SOD) is an active research topic in comp...
research
08/17/2023

Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

By integrating complementary information from RGB image and depth map, t...
research
07/03/2023

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

RGB-D salient object detection (SOD) aims to detect the prominent region...
research
04/25/2021

Visual Saliency Transformer

Recently, massive saliency detection methods have achieved promising res...
research
09/10/2021

ACFNet: Adaptively-Cooperative Fusion Network for RGB-D Salient Object Detection

The reasonable employment of RGB and depth data show great significance ...

Please sign up or login with your details

Forgot password? Click here to reset