Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

10/13/2021
by   Yuichiro Koyama, et al.
0

Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). RIRs corresponding to a microphone array assumed to be placed in various rooms are accurately simulated, and the source signals of the target sound events are extracted from a mixture. The simulated RIRs are then convolved with the extracted source signals to obtain an augmented multi-channel training dataset. Evaluation results obtained using the TAU-NIGENS Spatial Sound Events 2021 dataset show that the IRS contributes to improving the overall SELD performance. Additionally, we conducted an ablation study to discuss the contribution and need for each component within the IRS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

This report describes our systems submitted to the DCASE2021 challenge t...
research
09/06/2023

Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

As deeper and more complex models are developed for the task of sound ev...
research
06/21/2021

MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

A new impulse response (IR) dataset called "MeshRIR" is introduced. Curr...
research
01/24/2023

Perceptual evaluation of listener envelopment using spatial granular synthesis

Listener envelopment refers to the sensation of being surrounded by soun...
research
08/18/2023

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

We present Spatial LibriSpeech, a spatial audio dataset with over 650 ho...
research
12/17/2018

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Learning from data in the quaternion domain enables us to exploit intern...
research
04/05/2019

Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks

Despite there being clear evidence for top-down (e.g., attentional) effe...

Please sign up or login with your details

Forgot password? Click here to reset