A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

01/08/2021
by   Qing Wang, et al.
0

In this paper, we propose a novel four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection (SELD). First, we explore two spatial augmentation techniques, namely audio channel swapping (ACS) and multi-channel simulation (MCS), to deal with data sparsity in SELD. ACS and MDS focus on augmenting the limited training data with expanding direction of arrival (DOA) representations such that the acoustic models trained with the augmented data are robust to localization variations of acoustic sources. Next, time-domain mixing (TDM) and time-frequency masking (TFM) are also investigated to deal with overlapping sound events and data diversity. Finally, ACS, MCS, TDM and TFM are combined in a step-by-step manner to form an effective four-stage data augmentation scheme. Tested on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 data sets, our proposed augmentation approach greatly improves the system performance, ranking our submitted system in the first place in the SELD task of DCASE 2020 Challenge. Furthermore, we employ a ResNet-Conformer architecture to model both global and local context dependencies of an audio sequence to yield further gains over those architectures used in the DCASE 2020 SELD evaluations.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 11

research
08/06/2021

SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features

A mixed sample data augmentation strategy is proposed to enhance the per...
research
10/12/2021

Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

Data augmentation methods have shown great importance in diverse supervi...
research
05/19/2022

The AI Mechanic: Acoustic Vehicle Characterization Neural Networks

In a world increasingly dependent on road-based transportation, it is es...
research
08/18/2023

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

We present Spatial LibriSpeech, a spatial audio dataset with over 650 ho...
research
06/24/2022

Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

Performance of sound event localization and detection (SELD) in real sce...
research
03/16/2022

A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

Environmental sound recognition (ESR) is an emerging research topic in a...
research
05/05/2021

Acoustic Scene Classification Using Multichannel Observation with Partially Missing Channels

Sounds recorded with smartphones or IoT devices often have partially unr...

Please sign up or login with your details

Forgot password? Click here to reset