DeepAI AI Chat
Log In Sign Up

ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos

by   Yi Zhang, et al.

Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD), which mimics human attention mechanism by segmenting salient objects with the guidance of audio-visual cues. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with human eye fixations, bounding boxes, object-/instance-level masks, and associated attributes (e.g., geometrical distortion). These coarse-to-fine annotations enable detailed analysis for PV-SOD modeling, e.g., determining the major challenges for existing SOD models, and predicting scanpaths to study the long-term eye fixation behaviors of humans. We systematically benchmark 11 representative approaches on ASOD60K and derive several interesting findings. We hope this study could serve as a good starting point for advancing SOD research towards panoramic videos.


page 5

page 8

page 11

page 12

page 15

page 16

page 17

page 18


SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos

Salient human detection (SHD) in dynamic 360 immersive videos is of grea...

A Fixation-based 360° Benchmark Dataset for Salient Object Detection

Fixation prediction (FP) in panoramic contents has been widely investiga...

A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection

Image-based salient object detection (SOD) has been extensively studied ...

Motion Guided Attention for Video Salient Object Detection

Video salient object detection aims at discovering the most visually dis...

Horizontal-to-Vertical Video Conversion

Alongside the prevalence of mobile videos, the general public leans towa...

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

Augmented Reality (AR) as a platform has the potential to facilitate the...

Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

This manuscript introduces the problem of prominent object detection and...

Code Repositories