SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

10/01/2021
by   Thi Ngoc Tho Nguyen, et al.
0

Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses amplitude and/or phase differences between microphones to estimate source directions. As a result, it is often difficult to jointly optimize these two subtasks. We propose a novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources. The SALSA feature consists of multichannel log-spectrograms stacked along with the normalized principal eigenvector of the spatial covariance matrix at each corresponding time-frequency bin. Depending on the microphone array format, the principal eigenvector can be normalized differently to extract amplitude and/or phase differences between the microphones. As a result, SALSA features are applicable for different microphone array formats such as first-order ambisonics (FOA) and multichannel microphone array (MIC). Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset with directional interferences showed that SALSA features outperformed other state-of-the-art features. Specifically, the use of SALSA features in the FOA format increased the F1 score and localization recall by 6 log-mel spectrograms with intensity vectors. For the MIC format, using SALSA features increased F1 score and localization recall by 16 respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra. Our ensemble model trained on SALSA features ranked second in the team category of the SELD task in the 2021 DCASE Challenge.

READ FULL TEXT

page 1

page 4

page 6

page 14

research
06/29/2021

DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection

Sound event localization and detection consists of two subtasks which ar...
research
11/26/2019

A two-step system for sound event localization and detection

Sound event detection and sound event localization requires different fe...
research
07/22/2021

What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis

Sound event localization and detection (SELD) is an emerging research to...
research
11/08/2021

The complex-valued correlation coefficient accounts for binaural detection

Binaural hearing is one of the principal mechanisms enabling the localiz...
research
06/13/2021

SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform

We present a new framework SoundDet, which is an end-to-end trainable an...
research
03/19/2022

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection

Polyphonic sound event localization and detection (SELD) aims at detecti...
research
02/14/2020

Sound Event Localization based on Sound Intensity Vector Refined By DNN-Based Denoising and Source Separation

We propose a direction-of-arrival (DOA) estimation method for Sound Even...

Please sign up or login with your details

Forgot password? Click here to reset