UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

09/15/2023
by   Junwen Xiong, et al.
0

Video saliency prediction and detection are thriving research domains that enable computers to simulate the distribution of visual attention akin to how humans perceiving dynamic scenes. While many approaches have crafted task-specific training paradigms for either video saliency prediction or video salient object detection tasks, few attention has been devoted to devising a generalized saliency modeling framework that seamlessly bridges both these distinct tasks. In this study, we introduce the Unified Saliency Transformer (UniST) framework, which comprehensively utilizes the essential attributes of video saliency prediction and video salient object detection. In addition to extracting representations of frame sequences, a saliency-aware transformer is designed to learn the spatio-temporal representations at progressively increased resolutions, while incorporating effective cross-scale saliency information to produce a robust representation. Furthermore, a task-specific decoder is proposed to perform the final prediction for each task. To the best of our knowledge, this is the first work that explores designing a transformer structure for both saliency modeling tasks. Convincible experiments demonstrate that the proposed UniST achieves superior performance across seven challenging benchmarks for two tasks, and significantly outperforms the other state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 11

research
07/12/2018

Video Saliency Detection by 3D Convolutional Neural Networks

Different from salient object detection methods for still images, a key ...
research
03/09/2022

A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection

Humans tend to mine objects by learning from a group of images or severa...
research
09/19/2022

Panoramic Vision Transformer for Saliency Detection in 360° Videos

360^∘ video saliency detection is one of the challenging benchmarks for ...
research
08/05/2021

Unifying Global-Local Representations in Salient Object Detection with Transformer

The fully convolutional network (FCN) has dominated salient object detec...
research
11/19/2016

Multi-Scale Saliency Detection using Dictionary Learning

Saliency detection has drawn a lot of attention of researchers in variou...
research
04/12/2016

Recurrent Attentional Networks for Saliency Detection

Convolutional-deconvolution networks can be adopted to perform end-to-en...
research
03/11/2023

CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

Incorporating the audio stream enables Video Saliency Prediction (VSP) t...

Please sign up or login with your details

Forgot password? Click here to reset