Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

02/28/2021
by   Christopher Schymura, et al.
0

Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model. These types of models have been successfully applied to problems in natural language processing and automatic speech recognition. In this work, a multi-channel audio signal is encoded to a latent representation, which is subsequently decoded to a sequence of estimated directions-of-arrival. Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step. The framework is evaluated on three publicly available datasets for sound event localization. It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.

READ FULL TEXT
research
06/07/2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

Sound event localization aims at estimating the positions of sound sourc...
research
10/22/2019

Sound Event Localization and Detection Using CRNN on Pairs of Microphones

This paper proposes sound event localization and detection methods from ...
research
07/30/2021

TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs

Sound event localisation and detection (SELD) is a problem in the field ...
research
08/27/2019

A hybrid parametric-deep learning approach for sound event localization and detection

This work describes and discusses an algorithm submitted to the Sound Ev...
research
09/26/2022

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Large-scale sound recognition data sets typically consist of acoustic re...
research
07/10/2023

EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation

Over 1.5 billion people worldwide live with hearing impairment. Despite ...
research
10/29/2020

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

Neural-network (NN)-based methods show high performance in sound event l...

Please sign up or login with your details

Forgot password? Click here to reset