Furnishing Sound Event Detection with Language Model Abilities

08/22/2023
by   Hualei Wang, et al.
0

Recently, the ability of language models (LMs) has attracted increasing attention in visual cross-modality. In this paper, we further explore the generation capacity of LMs for sound event detection (SED), beyond the visual domain. Specifically, we propose an elegant method that aligns audio features and text features to accomplish sound event classification and temporal location. The framework consists of an acoustic encoder, a contrastive module that align the corresponding representations of the text and audio, and a decoupled language decoder that generates temporal and event sequences from the audio characteristic. Compared with conventional works that require complicated processing and barely utilize limited audio features, our model is more concise and comprehensive since language model directly leverage its semantic capabilities to generate the sequences. We investigate different decoupling modules to demonstrate the effectiveness for timestamps capture and event classification. Evaluation results show that the proposed method achieves accurate sequences of sound event detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection

In this paper, we propose a temporal-frequential attention model for sou...
research
04/11/2019

Cross-task learning for audio tagging, sound event detection spatial localization: DCASE 2019 baseline systems

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2...
research
02/12/2023

SemanticAC: Semantics-Assisted Framework for Audio Classification

In this paper, we propose SemanticAC, a semantics-assisted framework for...
research
03/25/2022

Audio-text Retrieval in Context

Audio-text retrieval based on natural language descriptions is a challen...
research
05/04/2023

Few-shot Domain-Adaptive Visually-fused Event Detection from Text

Incorporating auxiliary modalities such as images into event detection m...
research
04/26/2021

Identifying Actions for Sound Event Classification

In Psychology, actions are paramount for humans to perceive and separate...
research
08/14/2023

DiffSED: Sound Event Detection with Denoising Diffusion

Sound Event Detection (SED) aims to predict the temporal boundaries of a...

Please sign up or login with your details

Forgot password? Click here to reset