Visual Saliency Transformer

04/25/2021
by   Nian Liu, et al.
0

Recently, massive saliency detection methods have achieved promising results by relying on CNN-based architectures. Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution. Specifically, we develop a novel unified model based on a pure transformer, namely, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD). It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Apart from the traditional transformer architecture used in Vision Transformer (ViT), we leverage multi-level token fusion and propose a new token upsampling method under the transformer framework to get high-resolution detection results. We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets. Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models.

READ FULL TEXT

page 3

page 8

page 14

page 15

research
12/04/2021

TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D Salient Object Detection

Most of the existing RGB-D salient object detection methods utilize the ...
research
07/09/2022

SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification

RGB-D SOD uses depth information to handle challenging scenes and obtain...
research
03/11/2023

TransMatting: Tri-token Equipped Transformer Model for Image Matting

Image matting aims to predict alpha values of elaborate uncertainty area...
research
12/01/2021

Transformer-based Network for RGB-D Saliency Detection

RGB-D saliency detection integrates information from both RGB images and...
research
06/07/2022

Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection

Salient Object Detection is the task of predicting the human attended re...
research
01/26/2023

Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

Saliency Prediction aims to predict the attention distribution of human ...
research
03/20/2022

Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

This paper presents a new vision Transformer, named Iwin Transformer, wh...

Please sign up or login with your details

Forgot password? Click here to reset