SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer

11/30/2021
by   Zhirong Ye, et al.
0

Recently, an event-based end-to-end model (SEDT) has been proposed for sound event detection (SED) and achieves competitive performance. However, compared with the frame-based model, it requires more training data with temporal annotations to improve the localization ability. Synthetic data is an alternative, but it suffers from a great domain gap with real recordings. Inspired by the great success of UP-DETR in object detection, we propose to self-supervisedly pre-train SEDT (SP-SEDT) by detecting random patches (only cropped along the time axis). Experiments on the DCASE2019 task4 dataset show the proposed SP-SEDT can outperform fine-tuned frame-based model. The ablation study is also conducted to investigate the impact of different loss functions and patch size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

Sound event detection (SED) has gained increasing attention with its wid...
research
11/18/2020

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Object detection with transformers (DETR) reaches competitive performanc...
research
10/18/2022

A Hybrid System of Sound Event Detection Transformer and Frame-wise Model for DCASE 2022 Task 4

In this paper, we describe in detail our system for DCASE 2022 Task4. Th...
research
11/02/2020

Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss

Due to the limitation of strong-labeled sound event detection data set, ...
research
03/28/2023

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

We present TabRet, a pre-trainable Transformer-based model for tabular d...
research
08/20/2018

A simple model for detection of rare sound events

We propose a simple recurrent model for detecting rare sound events, whe...
research
07/19/2021

DeepSocNav: Social Navigation by Imitating Human Behaviors

Current datasets to train social behaviors are usually borrowed from sur...

Please sign up or login with your details

Forgot password? Click here to reset