Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

10/05/2021
by   Zhirong Ye, et al.
0

Sound event detection (SED) has gained increasing attention with its wide application in surveillance, video indexing, etc. Existing models in SED mainly generate frame-level predictions, converting it into a sequence multi-label classification problem, which inevitably brings a trade-off between event boundary detection and audio tagging when using weakly labeled data to train the model. Besides, it needs post-processing and cannot be trained in an end-to-end way. This paper firstly presents the 1D Detection Transformer (1D-DETR), inspired by Detection Transformer. Furthermore, given the characteristics of SED, the audio query and a one-to-many matching strategy for fine-tuning the model are added to 1D-DETR to form the model of Sound Event Detection Transformer (SEDT), which generates event-level predictions, end-to-end detection. Experiments are conducted on the URBAN-SED dataset and the DCASE2019 Task4 dataset, and both experiments have achieved competitive results compared with SOTA models. The application of SEDT on SED shows that it can be used as a framework for one-dimensional signal detection and may be extended to other similar tasks.

READ FULL TEXT

page 1

page 5

research
01/10/2019

Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection

The design of new methods and models when only weakly-labeled data are a...
research
11/30/2021

SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer

Recently, an event-based end-to-end model (SEDT) has been proposed for s...
research
08/14/2023

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

We propose a shift towards end-to-end learning in bird sound monitoring ...
research
08/14/2023

DiffSED: Sound Event Detection with Denoising Diffusion

Sound Event Detection (SED) aims to predict the temporal boundaries of a...
research
01/19/2021

Towards duration robust weakly supervised sound event detection

Sound event detection (SED) is the task of tagging the absence or presen...
research
10/18/2022

A Hybrid System of Sound Event Detection Transformer and Frame-wise Model for DCASE 2022 Task 4

In this paper, we describe in detail our system for DCASE 2022 Task4. Th...
research
03/18/2023

EarCough: Enabling Continuous Subject Cough Event Detection on Hearables

Cough monitoring can enable new individual pulmonary health applications...

Please sign up or login with your details

Forgot password? Click here to reset