Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection

12/27/2017
by   Shao-Yen Tseng, et al.
0

State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed MIL framework uses audio embeddings extracted from a pre-trained convolutional neural network as input features. We show that by using audio embeddings the MIL framework can be implemented using a simple DNN with performance comparable to recurrent neural networks. We evaluate our approach by training an audio tagging system using a subset of AudioSet, which is a large collection of weakly labeled YouTube video excerpts. Combined with a late-fusion approach, we improve the F1 score of a baseline audio tagging system by 17%. We show that audio embeddings extracted by the convolutional neural networks significantly boost the performance of all MIL models. This framework reduces the model complexity of the AED system and is suitable for applications where computational resources are limited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2017

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

State-of-the-art audio event detection (AED) systems rely on supervised ...
research
12/27/2017

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging

The lack of strong labels has severely limited the state-of-the-art full...
research
09/02/2017

Surrey-cvssp system for DCASE2017 challenge task4

In this technique report, we present a bunch of methods for the task 4 o...
research
07/17/2018

Data-Efficient Weakly Supervised Learning for Low-Resource Audio Event Detection Using Deep Learning

We propose a method to perform audio event detection under the common co...
research
08/07/2020

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

This paper proposes a network architecture mainly designed for audio tag...
research
04/04/2022

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

The adoption of advanced deep learning (DL) architecture in stuttering d...
research
06/01/2023

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

The adoption of advanced deep learning architectures in stuttering detec...

Please sign up or login with your details

Forgot password? Click here to reset