Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

04/12/2018
by   Qiuqiang Kong, et al.
0

Sound event detection (SED) aims to detect what and when sound events happen in an audio clip. Sound events can be segmented in the time-frequency (T-F) domain and is called T-F segmentation. Many supervised SED algorithms rely on strongly labelled data which contains labels of onset and offset times of sound events. However, many audio tagging datasets are weakly labelled, that is, only the presence or absence of the sound events is known, without knowing their onset and offset times. In this paper, we propose a SED and T-F segmentation framework trained with weakly labelled data. In the training stage, we propose a segmentation mapping applied on a T-F representation of an audio clip to obtain T-F segmentation masks of sound events. We then apply a classification mapping on each T-F segmentation mask to estimate the presence probability of each sound event. Both of the segmentation mapping and classification mapping are trained jointly. In T-F segmentation, T-F segmentation masks can be obtained by presenting a T-F representation of an audio clip to the trained segmentation mapping. In SED, predicted onset and offset times can be obtained from the T-F segmentation masks. We propose to model the segmentation mapping using a convolutional neural network and the segmentation mapping using a global weighted rank pooling (GWRP). As a byproduct, separated waveforms of sound events can be obtained from their corresponding T-F segmentation masks. Experiments on the remixed DCASE 2013 dataset show that the proposed method obtains an area under the curve (AUC) score of 0.948 in audio tagging and 0.893 in sound event detection, outperforming a deep neural network baseline of 0.719 and 0.616, respectively.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 8

research
11/08/2017

A joint separation-classification model for sound event detection of weakly labelled data

Source separation (SS) aims to separate individual sources from an audio...
research
04/27/2019

Sound Event Detection with Sequentially Labelled Data Based on Connectionist Temporal Classification and Unsupervised Clustering

Sound event detection (SED) methods typically rely on either strongly la...
research
11/17/2018

Polyphonic audio tagging with sequentially labelled data using CRNN with learnable gated linear units

Audio tagging aims to detect the types of sound events occurring in an a...
research
08/17/2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

Weakly Labelled learning has garnered lot of attention in recent years d...
research
09/23/2022

UniKW-AT: Unified Keyword Spotting and Audio Tagging

Within the audio research community and the industry, keyword spotting (...
research
02/15/2023

Unsupervised classification to improve the quality of a bird song recording dataset

Open audio databases such as Xeno-Canto are widely used to build dataset...
research
02/02/2015

Unsupervised Incremental Learning and Prediction of Music Signals

A system is presented that segments, clusters and predicts musical audio...

Please sign up or login with your details

Forgot password? Click here to reset