Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

05/24/2019
by   Liwei Lin, et al.
0

Sound event detection (SED) is to recognize the presence of sound events in the segment of audio and detect their onset as well as offset. SED can be regarded as a supervised learning task when strong annotations (timestamps) are available during learning. However, due to the high cost of manual strong labeling data, it becomes crucial to introduce weakly supervised learning to SED, in which only weak annotations (clip-level annotations without timestamps) are available during learning. In this paper, we approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with an embedding-level pooling module to solve it. The pooling module, which aggregates a sequence of high-level features generated by the neural network feature encoder into a single contextual feature representation, enables the model to learn with only weak annotations. We explore the self-learning ability of different pooling modules on finer information and propose a specialized decision surface (SDS) for class-wise attention pooling (cATP) module. We analyze and explained why a cATP module with SDS is better than other typical pooling modules from the perspective of feature space. According to the co-occurrence of several categories in the multi-label classification task, we also propose a disentangled feature (DF) to reduce interference between categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information in the training set and obtaining multiple different subspaces. Experiments show that our approach achieves state-of-art performance on Task4 of the DCASE2018 challenge.

READ FULL TEXT
research
05/24/2019

Disentangled Feature for Weakly Supervised Multi-class Sound Event Detection

We propose a disentangled feature for weakly supervised multiclass sound...
research
07/21/2020

Guided multi-branch learning systems for DCASE 2020 Task 4

In this paper, we describe in detail our systems for DCASE 2020 Task 4. ...
research
09/11/2019

Guided Learning Convolution System for DCASE 2019 Task 4

In this paper, we describe in detail the system we submitted to DCASE201...
research
10/22/2018

A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling

Sound event detection (SED) entails two subtasks: recognizing what types...
research
04/03/2018

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks

Many sequence learning tasks require the localization of certain events ...
research
10/25/2019

SeCoST: Sequential Co-Supervision for Weakly Labeled Audio Event Detection

Weakly supervised learning algorithms are critical for scaling audio eve...
research
03/07/2016

A novel learning-based frame pooling method for Event Detection

Detecting complex events in a large video collection crawled from video ...

Please sign up or login with your details

Forgot password? Click here to reset