ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos

07/24/2021
by   Yi Zhang, et al.
7

Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD), which mimics human attention mechanism by segmenting salient objects with the guidance of audio-visual cues. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with human eye fixations, bounding boxes, object-/instance-level masks, and associated attributes (e.g., geometrical distortion). These coarse-to-fine annotations enable detailed analysis for PV-SOD modeling, e.g., determining the major challenges for existing SOD models, and predicting scanpaths to study the long-term eye fixation behaviors of humans. We systematically benchmark 11 representative approaches on ASOD60K and derive several interesting findings. We hope this study could serve as a good starting point for advancing SOD research towards panoramic videos.

READ FULL TEXT

page 5

page 8

page 11

page 12

page 15

page 16

page 17

page 18

research
05/24/2021

SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos

Salient human detection (SHD) in dynamic 360 immersive videos is of grea...
research
01/22/2020

A Fixation-based 360° Benchmark Dataset for Salient Object Detection

Fixation prediction (FP) in panoramic contents has been widely investiga...
research
11/01/2016

A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection

Image-based salient object detection (SOD) has been extensively studied ...
research
09/16/2019

Motion Guided Attention for Video Salient Object Detection

Video salient object detection aims at discovering the most visually dis...
research
01/11/2021

Horizontal-to-Vertical Video Conversion

Alongside the prevalence of mobile videos, the general public leans towa...
research
07/25/2022

Salient Object Detection for Point Clouds

This paper researches the unexplored task-point cloud salient object det...
research
06/20/2022

A Novel Long-term Iterative Mining Scheme for Video Salient Object Detection

The existing state-of-the-art (SOTA) video salient object detection (VSO...

Please sign up or login with your details

Forgot password? Click here to reset