Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

02/16/2023
by   Hao Chen, et al.
0

Most of existing RGB-D salient object detection (SOD) methods follow the CNN-based paradigm, which is unable to model long-range dependencies across space and modalities due to the natural locality of CNNs. Here we propose the Hierarchical Cross-modal Transformer (HCT), a new multi-modal transformer, to tackle this problem. Unlike previous multi-modal transformers that directly connecting all patches from two modalities, we explore the cross-modal complementarity hierarchically to respect the modality gap and spatial discrepancy in unaligned regions. Specifically, we propose to use intra-modal self-attention to explore complementary global contexts, and measure spatial-aligned inter-modal attention locally to capture cross-modal correlations. In addition, we present a Feature Pyramid module for Transformer (FPT) to boost informative cross-scale integration as well as a consistency-complementarity module to disentangle the multi-modal integration path and improve the fusion adaptivity. Comprehensive experiments on a large variety of public datasets verify the efficacy of our designs and the consistent improvement over state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 6

page 7

research
04/01/2022

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

In autonomous driving, LiDAR point-clouds and RGB images are two major d...
research
12/04/2021

TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D Salient Object Detection

Most of the existing RGB-D salient object detection methods utilize the ...
research
08/08/2022

Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks

Salient Object Detection (SOD) is a popular and important topic aimed at...
research
07/09/2023

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers

This paper addresses the problem of cross-modal object tracking from RGB...
research
02/27/2022

DXM-TransFuse U-net: Dual Cross-Modal Transformer Fusion U-net for Automated Nerve Identification

Accurate nerve identification is critical during surgical procedures for...
research
01/29/2021

Self-Supervised Representation Learning for RGB-D Salient Object Detection

Existing CNNs-Based RGB-D Salient Object Detection (SOD) networks are al...
research
07/13/2020

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

The main purpose of RGB-D salient object detection (SOD) is how to bette...

Please sign up or login with your details

Forgot password? Click here to reset