Conditioned Time-Dilated Convolutions for Sound Event Detection

07/10/2020
by   Konstantinos Drossos, et al.
0

Sound event detection (SED) is the task of identifying sound events along with their onset and offset times. A recent, convolutional neural networks based SED method, proposed the usage of depthwise separable (DWS) and time-dilated convolutions. DWS and time-dilated convolutions yielded state-of-the-art results for SED, with considerable small amount of parameters. In this work we propose the expansion of the time-dilated convolutions, by conditioning them with jointly learned embeddings of the SED predictions by the SED classifier. We present a novel algorithm for the conditioning of the time-dilated convolutions which functions similarly to language modelling, and enhances the performance of the these convolutions. We employ the freely available TUT-SED Synthetic dataset, and we assess the performance of our method using the average per-frame F_1 score and average per-frame error rate, over the 10 experiments. We achieve an increase of 2% (from 0.63 to 0.65) at the average F_1 score (the higher the better) and a decrease of 3% (from 0.50 to 0.47) at the error rate (the lower the better).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2020

Sound Event Detection with Depthwise Separable and Dilated Convolutions

State-of-the-art sound event detection (SED) methods usually employ a se...
research
07/19/2019

Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling

A sound event detection (SED) method typically takes as an input a seque...
research
10/09/2017

A report on sound event detection with different binaural features

In this paper, we compare the performance of using binaural audio featur...
research
10/08/2021

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

In this paper, we propose TitaNet, a novel neural network architecture f...
research
06/09/2021

Exploiting Learned Symmetries in Group Equivariant Convolutions

Group Equivariant Convolutions (GConvs) enable convolutional neural netw...
research
11/02/2020

Revisiting Adaptive Convolutions for Video Frame Interpolation

Video frame interpolation, the synthesis of novel views in time, is an i...
research
03/31/2021

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices

We demonstrate that 1x1-convolutions in 1D time-channel separable convol...

Please sign up or login with your details

Forgot password? Click here to reset