SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos
Salient human detection (SHD) in dynamic 360 immersive videos is of great importance for various applications such as robotics, inter-human and human-object interaction in augmented reality. However, 360 video SHD has been seldom discussed in the computer vision community due to a lack of datasets with large-scale omnidirectional videos and rich annotations. To this end, we propose SHD360, the first 360 video SHD dataset collecting various real-life daily scenes, providing six-level hierarchical annotations for 6,268 key frames uniformly sampled from 37,403 omnidirectional video frames at 4K resolution. Specifically, each collected key frame is labeled with a super-class, a sub-class, associated attributes (e.g., geometrical distortion), bounding boxes and per-pixel object-/instance-level masks. As a result, our SHD360 contains totally 16,238 salient human instances with manually annotated pixel-wise ground truth. Since so far there is no method proposed for 360 SHD, we systematically benchmark 11 representative state-of-the-art salient object detection (SOD) approaches on our SHD360, and explore key issues derived from extensive experimenting results. We hope our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360 panoramic data. Our dataset and benchmark will be publicly available at https://github.com/PanoAsh/SHD360.
READ FULL TEXT