CNN-based RGB-D Salient Object Detection: Learn, Select and Fuse

09/20/2019
by   Hao Chen, et al.
11

The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which the well-learned source modality provides supervisory signals to facilitate the learning process for the new modality. To better extract the complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal interactions and cross-level transmissions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in zero-shot saliency detection and pre-training on a new modality, as well as the advantages in selecting and fusing cross-modal/cross-level complements.

READ FULL TEXT

page 2

page 7

page 8

page 9

page 10

research
03/01/2017

RGB-D Salient Object Detection Based on Discriminative Cross-modal Transfer Learning

In this work, we propose to utilize Convolutional Neural Networks to boo...
research
10/12/2020

Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

How to effectively fuse cross-modal information is the key problem for R...
research
10/04/2021

Cross-Modal Virtual Sensing for Combustion Instability Monitoring

In many cyber-physical systems, imaging can be an important but expensiv...
research
07/31/2021

Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking

The target representation learned by convolutional neural networks plays...
research
02/02/2023

MoE-Fusion: Instance Embedded Mixture-of-Experts for Infrared and Visible Image Fusion

Infrared and visible image fusion can compensate for the incompleteness ...
research
11/21/2021

TraVLR: Now You See It, Now You Don't! Evaluating Cross-Modal Transfer of Visio-Linguistic Reasoning

Numerous visio-linguistic (V+L) representation learning methods have bee...
research
03/30/2023

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Temporal action detection aims to predict the time intervals and the cla...

Please sign up or login with your details

Forgot password? Click here to reset