Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

08/17/2020
by   Soham Deshmukh, et al.
5

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED). The paper proposes a Multi-Task Learning (MTL) framework for learning from Weakly Labelled Audio data which encompasses the traditional Multiple Instance Learning (MIL) setup. The MTL framework uses two-step attention mechanism and reconstructs Time Frequency (T-F) representation of audio as the auxiliary task. By breaking the attention into two steps, the network retains better time level information without compromising classification performance. The auxiliary task uses an auto-encoder structure to encourage the network for retaining source specific information. This indirectly de-noises internal T- F representation and improves classification performance under noisy recordings. For evaluation of proposed methodology, we remix the DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data under 0, 10 and 20 db SNR. The proposed network outperforms existing benchmark models over all SNRs, specifically 22.3 respectively. The results and ablation study performed demonstrates the usefulness of auto-encoder for auxiliary task and verifies that the output of decoder portion provides a cleaned Time Frequency (T-F) representation of audio/sources which can be further used for source separation. The code is publicly released.

READ FULL TEXT

page 1

page 8

research
06/12/2021

Improving weakly supervised sound event detection with self-supervised auxiliary tasks

While multitask and transfer learning has shown to improve the performan...
research
11/08/2017

A joint separation-classification model for sound event detection of weakly labelled data

Source separation (SS) aims to separate individual sources from an audio...
research
04/12/2018

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

Sound event detection (SED) aims to detect what and when sound events ha...
research
04/02/2022

Improving Target Sound Extraction with Timestamp Information

Target sound extraction (TSE) aims to extract the sound part of a target...
research
04/27/2019

Sound Event Detection with Sequentially Labelled Data Based on Connectionist Temporal Classification and Unsupervised Clustering

Sound event detection (SED) methods typically rely on either strongly la...
research
09/16/2019

Acoustic scene analysis with multi-head attention networks

Acoustic Scene Classification (ASC) is a challenging task, as a single s...
research
07/21/2020

Guided multi-branch learning systems for DCASE 2020 Task 4

In this paper, we describe in detail our systems for DCASE 2020 Task 4. ...

Please sign up or login with your details

Forgot password? Click here to reset