Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains

09/05/2022
by   Jinbo Hu, et al.
0

Sound event localization and detection (SELD) is a joint task of sound event detection and direction-of-arrival estimation. In DCASE 2022 Task 3, types of data transform from computationally generated spatial recordings to recordings of real-sound scenes. Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentation method. Our method employs EINV2 with a track-wise output format, permutation-invariant training, and a soft parameter-sharing strategy, to detect different sound events of the same class but in different locations. The Conformer structure is used for extending EINV2 to learn local and global features. A data augmentation method, which contains several data augmentation chains composed of stochastic combinations of several different data augmentation operations, is utilized to generalize the model. To mitigate the lack of real-scene recordings in the development dataset and the presence of sound events being unbalanced, we exploit FSD50K, AudioSet, and TAU Spatial Room Impulse Response Database (TAU-SRIR DB) to generate simulated datasets for training. We present results on the validation set of Sony-TAu Realistic Spatial Soundscapes 2022 (STARSS22) in detail. Experimental results indicate that the ability to generalize to different environments and unbalanced performance among different classes are two main challenges. We evaluate our proposed method in Task 3 of the DCASE 2022 challenge and obtain the second rank in the teams ranking. Source code is released.

READ FULL TEXT
research
03/19/2022

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection

Polyphonic sound event localization and detection (SELD) aims at detecti...
research
06/24/2022

Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

Performance of sound event localization and detection (SELD) in real sce...
research
06/04/2022

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (ST...
research
10/25/2020

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Polyphonic sound event localization and detection (SELD), which jointly ...
research
06/30/2021

Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings

This paper proposes a novel framework for lung sound event detection, se...
research
10/14/2021

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

Sound event localization and detection (SELD) involves identifying the d...
research
07/08/2021

Heavily Augmented Sound Event Detection utilizing Weak Predictions

The performances of Sound Event Detection (SED) systems are greatly limi...

Please sign up or login with your details

Forgot password? Click here to reset