Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

12/10/2019
by   Qiuqiang Kong, et al.
0

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. To address the weakly labelled SED problem, we investigate segment-wise training and clip-wise training methods. The proposed systems are based on the variants of convolutional neural networks (CNNs) including convolutional recurrent neural networks and our proposed CNN-transformers for audio tagging and sound event detection. Another challenge of SED is that only the presence probabilities of sound events are predicted and thresholds are required to predict the presence or absence of sound events. Previous work set this threshold empirically which is not an optimised solution. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on the thresholds such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to the metric that depends on those thresholds. This proposed automatic threshold optimization system achieved state-of-the-art audio tagging and SED F1 score of 0.646, 0.584, outperforming the performance with best manually selected thresholds of 0.629 and 0.564, respectively.

READ FULL TEXT
research
08/06/2018

Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

Audio tagging aims to predict one or several labels in an audio clip. Ma...
research
03/01/2021

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

Multi-label audio tagging consists of assigning sets of tags to audio re...
research
04/27/2019

Sound Event Detection with Sequentially Labelled Data Based on Connectionist Temporal Classification and Unsupervised Clustering

Sound event detection (SED) methods typically rely on either strongly la...
research
03/25/2022

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

After its sweeping success in vision and language tasks, pure attention-...
research
08/10/2017

DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

This report presents our audio event detection system submitted for Task...
research
03/07/2023

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

In this paper, we propose an effective sound event detection (SED) metho...

Please sign up or login with your details

Forgot password? Click here to reset