CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

03/11/2023
by   Junwen Xiong, et al.
0

Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the selective attention mechanism of human brain. By focusing on the benefits of joint auditory and visual information, most VSP methods are capable of exploiting semantic correlation between vision and audio modalities but ignoring the negative effects due to the temporal inconsistency of audio-visual intrinsics. Inspired by the biological inconsistency-correction within multi-sensory information, in this study, a consistency-aware audio-visual saliency prediction network (CASP-Net) is proposed, which takes a comprehensive consideration of the audio-visual semantic interaction and consistent perception. In addition a two-stream encoder for elegant association between video frames and corresponding sound source, a novel consistency-aware predictive coding is also designed to improve the consistency within audio and visual representations iteratively. To further aggregate the multi-scale audio-visual information, a saliency decoder is introduced for the final saliency map generation. Substantial experiments demonstrate that the proposed CASP-Net outperforms the other state-of-the-art methods on six challenging audio-visual eye-tracking datasets. For a demo of our system please see our project webpage.

READ FULL TEXT

page 1

page 6

page 8

research
05/25/2019

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

This paper presents a conceptually simple and effective Deep Audio-Visua...
research
12/27/2021

Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception

Thanks to the rapid advances in deep learning techniques and the wide av...
research
06/20/2022

A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!

Video saliency detection (VSD) aims at fast locating the most attractive...
research
09/15/2023

UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

Video saliency prediction and detection are thriving research domains th...
research
12/11/2020

AViNet: Diving Deep into Audio-Visual Saliency Prediction

We propose the AViNet architecture for audiovisual saliency prediction. ...
research
05/03/2021

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Human perceives rich auditory experience with distinct sound heard by ea...
research
07/06/2021

UACANet: Uncertainty Augmented Context Attention for Polyp Semgnetaion

We propose Uncertainty Augmented Context Attention network (UACANet) for...

Please sign up or login with your details

Forgot password? Click here to reset