SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

07/23/2021
by   Pierre-Amaury Grumiaux, et al.
0

In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown to produce considerable savings in execution time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Keyword Transformer: A Self-Attention Model for Keyword Spotting

The Transformer architecture has been successful across many domains, in...
research
05/05/2021

Improved feature extraction for CRNN-based multiple sound source localization

In this work, we propose to extend a state-of-the-art multi-source local...
research
05/08/2020

Modeling Document Interactions for Learning to Rank with Regularized Self-Attention

Learning to rank is an important task that has been successfully deploye...
research
04/17/2021

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

Recently, our proposed recurrent neural network (RNN) based all deep lea...
research
07/20/2021

Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection

Joint sound event localization and detection (SELD) is an emerging audio...
research
02/18/2022

Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Extracting pitch information from music recordings is a challenging but ...
research
04/05/2019

NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining

In this paper, we present neural model architecture submitted to the Sem...

Please sign up or login with your details

Forgot password? Click here to reset