Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection

09/13/2023
by   Guangyu Ren, et al.
0

RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments. However, existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features. To address this, we first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions. The supervised loss component of MMHL distinctly utilizes semantic features from different modalities, while the self-supervised loss component reduces the distance between RGB and thermal features. We further consider both spatial and channel information during feature fusion and propose the Hybrid Fusion Module to effectively fuse RGB and thermal features. Lastly, instead of jointly training the network with cross-modal features, we implement a sequential training strategy which performs training only on RGB images in the first stage and then learns cross-modal features in the second stage. This training strategy improves saliency detection performance without computational overhead. Results from performance evaluation and ablation studies demonstrate the superior performance achieved by the proposed method compared with the existing state-of-the-art methods.

READ FULL TEXT

page 3

page 7

page 9

page 11

research
11/11/2022

Interactive Context-Aware Network for RGB-T Salient Object Detection

Salient object detection (SOD) focuses on distinguishing the most conspi...
research
05/23/2023

Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification

Multi-spectral vehicle re-identification aims to address the challenge o...
research
10/19/2022

Spatio-channel Attention Blocks for Cross-modal Crowd Counting

Crowd counting research has made significant advancements in real-world ...
research
12/09/2021

Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing

RGB thermal scene parsing has recently attracted increasing research int...
research
08/02/2022

Robust RGB-D Fusion for Saliency Detection

Efficiently exploiting multi-modal inputs for accurate RGB-D saliency de...
research
04/30/2019

Cross-Modal Message Passing for Two-stream Fusion

Processing and fusing information among multi-modal is a very useful tec...
research
04/28/2019

Translate-to-Recognize Networks for RGB-D Scene Recognition

Cross-modal transfer is helpful to enhance modality-specific discriminat...

Please sign up or login with your details

Forgot password? Click here to reset