Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network

10/09/2017
by   Sharath Adavanne, et al.
0

This paper proposes a neural network architecture and training scheme to learn the start and end time of sound events (strong labels) in an audio recording given just the list of sound events existing in the audio without time information (weak labels). We achieve this by using a stacked convolutional and recurrent neural network with two prediction layers in sequence one for the strong followed by the weak label. The network is trained using frame-wise log mel-band energy as the input audio feature, and weak labels provided in the dataset as labels for the weak label prediction layer. Strong labels are generated by replicating the weak labels as many number of times as the frames in the input audio feature, and used for strong label layer during training. We propose to control what the network learns from the weak and strong labels by different weighting for the loss computed in the two prediction layers. The proposed method is evaluated on a publicly available dataset of 155 hours with 17 sound event classes. The method achieves the best error rate of 0.84 for strong labels and F-score of 43.3 the unseen test split.

READ FULL TEXT

page 2

page 4

research
01/29/2018

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

In this paper, we propose a stacked convolutional and recurrent neural n...
research
07/10/2020

Overcoming label noise in audio event detection using sequential labeling

This paper addresses the noisy label issue in audio event detection (AED...
research
10/09/2017

A report on sound event detection with different binaural features

In this paper, we compare the performance of using binaural audio featur...
research
05/14/2021

The Benefit Of Temporally-Strong Labels In Audio Event Classification

To reveal the importance of temporal precision in ground truth audio eve...
research
11/02/2018

Unifying Isolated and Overlapping Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks

We propose a multi-label multi-task framework based on a convolutional r...
research
05/23/2017

Grounded Recurrent Neural Networks

In this work, we present the Grounded Recurrent Neural Network (GRNN), a...

Please sign up or login with your details

Forgot password? Click here to reset