Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation

08/04/2019
by   Wei Xia, et al.
0

In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels. The tfc-SE module consists of two parts: tf-SE block and c-SE block which are designed to provide attention on time-frequency and channel domain, respectively, for adaptively recalibrating the input feature map. The proposed tfc-SE module, together with a popular Convolutional Recurrent Neural Network (CRNN) model, are evaluated on a multi-channel sound event detection task with overlapping audio sources: the training and test data are synthesized TUT Sound Events 2018 datasets, recorded with microphone arrays. We show that the tfc-SE module can be incorporated into the CRNN model at a small additional computational cost and bring significant improvements on sound event detection accuracy. We also perform detailed ablation studies by analyzing various factors that may influence the performance of the SE blocks. We show that with the best tfc-SE block, error rate (ER) decreases from 0.2538 to 0.2026, relative 20.17% reduction of ER, and 5.72% improvement of F1 score. The results indicate that the learned acoustic embeddings with the tfc-SE module efficiently strengthen time-frequency and channel-wise feature representations to improve the discriminative performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2019

Multi-Scale Time-Frequency Attention for Rare Sound Event Detection

Attention mechanism has been widely applied to various sound-related tas...
research
06/20/2023

Frequency Channel Attention for computationally efficient sound event detection

We explore on various attention methods on frequency and channel dimensi...
research
11/15/2019

Adaptive Multi-scale Detection of Acoustic Events

The goal of acoustic (or sound) events detection (AED or SED) is to pred...
research
11/25/2021

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

The challenges of polyphonic sound event detection (PSED) stem from the ...
research
09/05/2017

Squeeze-and-Excitation Networks

Convolutional neural networks are built upon the convolution operation, ...
research
08/31/2023

ReZero: Region-customizable Sound Extraction

We introduce region-customizable sound extraction (ReZero), a general an...
research
06/25/2020

Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs

Sound Event Localization and Detection (SELD) is a problem related to th...

Please sign up or login with your details

Forgot password? Click here to reset