Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling

07/19/2019
by   Konstantinos Drossos, et al.
0

A sound event detection (SED) method typically takes as an input a sequence of audio frames and predicts the activities of sound events in each frame. In real-life recordings, the sound events exhibit some temporal structure: for instance, a "car horn" will likely be followed by a "car passing by". While this temporal structure is widely exploited in sequence prediction tasks (e.g., in machine translation), where language models (LM) are exploited, it is not satisfactorily modeled in SED. In this work we propose a method which allows a recurrent neural network (RNN) to learn an LM for the SED task. The method conditions the input of the RNN with the activities of classes at the previous time step. We evaluate our method using F1 score and error rate (ER) over three different and publicly available datasets; the TUT-SED Synthetic 2016 and the TUT Sound Events 2016 and 2017 datasets. The obtained results show an increase of 6 (lower is better) for the TUT Sound Events 2016 and 2017 datasets, respectively, when using our method. On the contrary, with our method there is a decrease of 10 Synthetic 2016 dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2016

Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

In this paper we present an approach to polyphonic sound event detection...
research
02/02/2015

Unsupervised Incremental Learning and Prediction of Music Signals

A system is presented that segments, clusters and predicts musical audio...
research
07/10/2020

Conditioned Time-Dilated Convolutions for Sound Event Detection

Sound event detection (SED) is the task of identifying sound events alon...
research
07/19/2018

A Capsule based Approach for Polyphonic Sound Event Detection

Polyphonic sound event detection (polyphonic SED) is an interesting but ...
research
10/09/2017

A report on sound event detection with different binaural features

In this paper, we compare the performance of using binaural audio featur...
research
01/24/2023

Perceptual evaluation of listener envelopment using spatial granular synthesis

Listener envelopment refers to the sensation of being surrounded by soun...

Please sign up or login with your details

Forgot password? Click here to reset