Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data

by   Dezhi Wang, et al.

Sound event detection (SED) is typically posed as a supervised learning problem requiring training data with strong temporal labels of sound events. However, the production of datasets with strong labels normally requires unaffordable labor cost. It limits the practical application of supervised SED methods. The recent advances in SED approaches focuses on detecting sound events by taking advantages of weakly labeled or unlabeled training data. In this paper, we propose a joint framework to solve the SED task using large-scale unlabeled in-domain data. In particular, a state-of-the-art general audio tagging model is first employed to predict weak labels for unlabeled data. On the other hand, a weakly supervised architecture based on the convolutional recurrent neural network (CRNN) is developed to solve the strong annotations of sound events with the aid of the unlabeled data with predicted labels. It is found that the SED performance generally increases as more unlabeled data is added into the training. To address the noisy label problem of unlabeled data, an ensemble strategy is applied to increase the system robustness. The proposed system is evaluated on the SED dataset of DCASE 2018 challenge. It reaches a F1-score of 21.0 over the baseline system.


Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

Sound event detection is a challenging task, especially for scenes with ...

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

This paper proposes a network architecture mainly designed for audio tag...

Weakly Labeled Sound Event Detection Using Tri-training and Adversarial Learning

This paper considers a semi-supervised learning framework for weakly lab...

Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and Self-Adaptive Label Refinement

In this paper, we present a gated convolutional recurrent neural network...

Noisy Labels for Weakly Supervised Gamma Hadron Classification

Gamma hadron classification, a central machine learning task in gamma ra...

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training

In this paper, we propose a method called Hodge and Podge for sound even...

Adaptive pooling operators for weakly labeled sound event detection

Sound event detection (SED) methods are tasked with labeling segments of...

Please sign up or login with your details

Forgot password? Click here to reset