Self-Supervised Representation Learning for RGB-D Salient Object Detection

01/29/2021
by   Xiaoqi Zhao, et al.
0

Existing CNNs-Based RGB-D Salient Object Detection (SOD) networks are all required to be pre-trained on the ImageNet to learn the hierarchy features which can help to provide a good initialization. However, the collection and annotation of large-scale datasets are time-consuming and expensive. In this paper, we utilize Self-Supervised Representation Learning (SSL) to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and unlabeled RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts as well as reduce the gap between two modalities, thereby providing an effective initialization for the downstream task. In addition, for the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion (MPF) module that splits a single feature fusion into multi-path fusion to achieve an adequate perception of consistent and differential information. The MPF module is general and suitable for both cross-modal and cross-level feature fusion. Extensive experiments on six benchmark RGB-D SOD datasets, our model pre-trained on the RGB-D dataset (6,335 without any annotations) can perform favorably against most state-of-the-art RGB-D methods pre-trained on ImageNet (1,280,000 with image-level annotations).

READ FULL TEXT

page 1

page 7

page 8

research
02/13/2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Current RGB-D scene recognition approaches often train two standalone ba...
research
07/14/2020

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

Existing RGB-D salient object detection (SOD) approaches concentrate on ...
research
02/16/2023

Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Most of existing RGB-D salient object detection (SOD) methods follow the...
research
03/01/2017

RGB-D Salient Object Detection Based on Discriminative Cross-modal Transfer Learning

In this work, we propose to utilize Convolutional Neural Networks to boo...
research
03/14/2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

Masked Autoencoders learn strong visual representations and achieve stat...
research
10/11/2022

ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive Learning

We introduce ViFiCon, a self-supervised contrastive learning scheme whic...
research
12/04/2021

TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D Salient Object Detection

Most of the existing RGB-D salient object detection methods utilize the ...

Please sign up or login with your details

Forgot password? Click here to reset