The Audio-Visual BatVision Dataset for Research on Sight and Sound

03/13/2023
by   Amandine Brunetto, et al.
0

Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chirping sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound phaenomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. The data can be downloaded at https://forms.gle/W6xtshMgoXGZDwsE7

READ FULL TEXT

page 2

page 3

page 6

page 7

page 8

research
05/24/2023

Polarimetric Imaging for Perception

Autonomous driving and advanced driver-assistance systems rely on a set ...
research
12/15/2019

BatVision: Learning to See 3D Spatial Layout with Two Ears

Virtual camera images showing the correct layout of a space ahead can be...
research
09/06/2023

Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

As deeper and more complex models are developed for the task of sound ev...
research
09/03/2020

Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera

Modern high-definition LIDAR is expensive for commercial autonomous driv...
research
11/06/2022

"Seeing Sound": Audio Classification with the Wigner-Wille Distribution and Convolutional Neural Networks

With big data becoming increasingly available, IoT hardware becoming wid...
research
12/04/2022

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

3D audio-visual production aims to deliver immersive and interactive exp...
research
10/25/2019

JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments

We present JRDB, a novel dataset collected from our social mobile manipu...

Please sign up or login with your details

Forgot password? Click here to reset