STAViS: Spatio-Temporal AudioVisual Saliency Network

01/09/2020
by   Antigoni Tsiami, et al.
23

We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies in order to obtain a final saliency map. The network has been designed, trained end-to-end, and evaluated on six different databases that contain audiovisual eye-tracking data of a large variety of videos. We compare our method against 8 different state-of-the-art visual saliency models. Evaluation results across databases indicate that our STAViS model outperforms our visual only variant as well as the other state-of-the-art models in the majority of cases. Also, the consistently good performance it achieves for all databases indicates that it is appropriate for estimating saliency "in-the-wild".

READ FULL TEXT

page 1

page 3

page 6

page 8

research
05/19/2023

ViDaS Video Depth-aware Saliency Network

We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware...
research
12/03/2018

SUSiNet: See, Understand and Summarize it

In this work we propose a multi-task spatio-temporal network, called SUS...
research
08/24/2021

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

3D convolutional neural networks have achieved promising results for vid...
research
05/25/2019

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

This paper presents a conceptually simple and effective Deep Audio-Visua...
research
01/31/2013

Fast non parametric entropy estimation for spatial-temporal saliency method

This paper formulates bottom-up visual saliency as center surround condi...
research
11/14/2018

How Drones Look: Crowdsourced Knowledge Transfer for Aerial Video Saliency Prediction

In ground-level platforms, many saliency models have been developed to p...
research
12/10/2021

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery

Representation learning of the task-oriented attention while tracking in...

Please sign up or login with your details

Forgot password? Click here to reset