Transformer Transforms Salient Object Detection and Camouflaged Object Detection

04/20/2021
by   Yuxin Mao, et al.
0

The transformer networks, which originate from machine translation, are particularly good at modeling long-range dependencies within a long sequence. Currently, the transformer networks are making revolutionary progress in various vision tasks ranging from high-level classification tasks to low-level dense prediction tasks. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). Specifically, we adopt the dense transformer backbone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD via scribble supervision. As an extension, we also apply our fully supervised model to the task of camouflaged object detection (COD) for camouflaged object segmentation. For the fully supervised models, we define the dense transformer backbone as feature encoder, and design a very simple decoder to produce a one channel saliency map (or camouflage map for the COD task). For the weakly supervised model, as there exists no structure information in the scribble annotation, we first adopt the recent proposed Gated-CRF loss to effectively model the pair-wise relationships for accurate model prediction. Then, we introduce self-supervised learning strategy to push the model to produce scale-invariant predictions, which is proven effective for weakly supervised models and models trained on small training datasets. Extensive experimental results on various SOD and COD tasks (fully supervised RGB image based SOD, fully supervised RGB-D image pair based SOD, weakly supervised SOD via scribble supervision, and fully supervised RGB image based COD) illustrate that transformer networks can transform salient object detection and camouflaged object detection, leading to new benchmarks for each related task.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 9

page 10

page 11

page 12

research
03/27/2023

Transformer-based Multi-Instance Learning for Weakly Supervised Object Detection

Weakly Supervised Object Detection (WSOD) enables the training of object...
research
12/08/2020

Structure-Consistent Weakly Supervised Salient Object Detection with Local Saliency Coherence

Sparse labels have been attracting much attention in recent years. Howev...
research
04/06/2021

Weakly Supervised Video Salient Object Detection

Significant performance improvement has been achieved for fully-supervis...
research
01/30/2023

Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels

The insufficient number of annotated thermal infrared (TIR) image datase...
research
06/14/2022

Efficient Decoder-free Object Detection with Transformers

Vision transformers (ViTs) are changing the landscape of object detectio...
research
02/21/2023

A General Visual Representation Guided Framework with Global Affinity for Weakly Supervised Salient Object Detection

Fully supervised salient object detection (SOD) methods have made consid...
research
09/08/2023

Weakly Supervised Point Clouds Transformer for 3D Object Detection

The annotation of 3D datasets is required for semantic-segmentation and ...

Please sign up or login with your details

Forgot password? Click here to reset