X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

by   DongKi Noh, et al.

In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.


page 1

page 3

page 5

page 6

page 7


A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

Neural Radiance Fields (NeRF) has achieved impressive results in single ...

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

3D human pose estimation in outdoor environments has garnered increasing...

WiSARD: A Labeled Visual and Thermal Image Dataset for Wilderness Search and Rescue

Sensor-equipped unoccupied aerial vehicles (UAVs) have the potential to ...

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding

Despite the fact that many 3D human activity benchmarks being proposed, ...

On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Learning-based methods to solve dense 3D vision problems typically train...

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

With the widespread use of intelligent systems, such as smart speakers, ...

Teleoperation System Using Past Image Records Considering Narrow Communication Band

Teleoperation is necessary when the robot is applied to real missions, f...

Please sign up or login with your details

Forgot password? Click here to reset