DFTR: Depth-supervised Hierarchical Feature Fusion Transformer for Salient Object Detection

03/12/2022
by   Heqin Zhu, et al.
14

Automated salient object detection (SOD) plays an increasingly crucial role in many computer vision applications. Although existing frameworks achieve impressive SOD performances especially with the development of deep learning techniques, their performances still have room for improvement. In this work, we propose a novel pure Transformer-based SOD framework, namely Depth-supervised hierarchical feature Fusion TRansformer (DFTR), to further improve the accuracy of both RGB and RGB-D SOD. The proposed DFTR involves three primary improvements: 1) The backbone of feature encoder is switched from a convolutional neural network to a Swin Transformer for more effective feature extraction; 2) We propose a multi-scale feature aggregation (MFA) module to fully exploit the multi-scale features encoded by the Swin Transformer in a coarse-to-fine manner; 3) Following recent studies, we formulate an auxiliary task of depth map prediction and use the ground-truth depth maps as extra supervision signals for network learning. To enable bidirectional information flow between saliency and depth branches, a novel multi-task feature fusion (MFF) module is integrated into our DFTR. We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks. The code and model will be released.

READ FULL TEXT

page 6

page 8

research
01/12/2018

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Salient object detection is a fundamental problem and has been received ...
research
07/09/2022

SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification

RGB-D SOD uses depth information to handle challenging scenes and obtain...
research
07/13/2022

Symmetry-Aware Transformer-based Mirror Detection

Mirror detection aims to identify the mirror regions in the given input ...
research
11/03/2017

Multi-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection

With the development of depth cameras such as Kinect and Intel Realsense...
research
12/02/2021

MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object Detection

Salient object detection (SOD) on RGB-D images is an active problem in c...
research
03/09/2022

Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction

Benefiting from color independence, illumination invariance and location...
research
08/17/2020

Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection

In this paper, we aim to develop an efficient and compact deep network f...

Please sign up or login with your details

Forgot password? Click here to reset