Fine-tune the pretrained ATST model for sound event detection

09/15/2023
by   Nian Shao, et al.
0

Sound event detection (SED) often suffers from the data deficiency problem. The recent baseline system in the DCASE2023 challenge task 4 leverages the large pretrained self-supervised learning (SelfSL) models to mitigate such restriction, where the pretrained models help to produce more discriminative features for SED. However, the pretrained models are regarded as a frozen feature extractor in the challenge baseline system and most of the challenge submissions, and fine-tuning of the pretrained models has been rarely studied. In this work, we study the fine-tuning method of the pretrained models for SED. We first introduce ATST-Frame, our newly proposed SelfSL model, to the SED system. ATST-Frame was especially designed for learning frame-level representations of audio signals and obtained state-of-the-art (SOTA) performances on a series of downstream tasks. We then propose a fine-tuning method for ATST-Frame using both (in-domain) unlabelled and labelled SED data. Our experiments show that, the proposed method overcomes the overfitting problem when fine-tuning the large pretrained network, and our SED system obtains new SOTA results of 0.587/0.812 PSDS1/PSDS2 scores on the DCASE challenge task 4 dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Recent pretrained language models extend from millions to billions of pa...
research
11/26/2021

Self-supervised Pretraining with Classification Labels for Temporal Activity Detection

Temporal Activity Detection aims to predict activity classes per frame, ...
research
10/13/2022

Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models

We introduce equi-tuning, a novel fine-tuning method that transforms (po...
research
07/26/2018

A Better Baseline for AVA

We introduce a simple baseline for action localization on the AVA datase...
research
03/07/2023

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

In this paper, we propose an effective sound event detection (SED) metho...
research
11/11/2022

Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

Temporal Action Localization (TAL) methods typically operate on top of f...
research
10/06/2022

SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data

Recent success in fine-tuning large models, that are pretrained on broad...

Please sign up or login with your details

Forgot password? Click here to reset