PANDA: A Gigapixel-level Human-centric Video Dataset

03/10/2020
by   Xueyang Wang, et al.
15

We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view ( 1 square kilometer area) and high-resolution details ( gigapixel-level/frame). The scenes may contain 4k head counts with over 100x scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a 'global-to-local zoom-in' framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.

READ FULL TEXT

page 1

page 4

page 7

page 11

research
05/24/2021

SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos

Salient human detection (SHD) in dynamic 360 immersive videos is of grea...
research
07/27/2023

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

We introduce PointOdyssey, a large-scale synthetic dataset, and data gen...
research
12/19/2017

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

There is growing interest in artificial intelligence to build socially i...
research
05/09/2020

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

Along with the development of the modern smart city, human-centric video...
research
12/04/2022

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

3D audio-visual production aims to deliver immersive and interactive exp...
research
04/12/2018

Benchmark data and method for real-time people counting in cluttered scenes using depth sensors

Real-time automatic counting of people has widespread applications in se...
research
09/03/2018

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

Datasets have gained an enormous amount of popularity in the computer vi...

Please sign up or login with your details

Forgot password? Click here to reset