MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

04/17/2021
by   Xiyun Li, et al.
0

Recently, our proposed recurrent neural network (RNN) based all deep learning minimum variance distortionless response (ADL-MVDR) beamformer method yielded superior performance over the conventional MVDR by replacing the matrix inversion and eigenvalue decomposition with two recurrent neural networks. In this work, we present a self-attentive RNN beamformer to further improve our previous RNN-based beamformer by leveraging on the powerful modeling capability of self-attention. Temporal-spatial self-attention module is proposed to better learn the beamforming weights from the speech and noise spatial covariance matrices. The temporal self-attention module could help RNN to learn global statistics of covariance matrices. The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices. Furthermore, a multi-channel input with multi-speaker directional features and multi-speaker speech separation outputs (MIMO) model is developed to improve the inference efficiency. The evaluations demonstrate that our proposed MIMO self-attentive RNN beamformer improves both the automatic speech recognition (ASR) accuracy and the perceptual estimation of speech quality (PESQ) against prior arts.

READ FULL TEXT
research
11/09/2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer

Acoustic echo cancellation (AEC) is a technique used in full-duplex comm...
research
01/04/2021

Generalized RNN beamformer for target speech separation

Recently we proposed an all-deep-learning minimum variance distortionles...
research
09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...
research
11/22/2022

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

Several speech processing systems have demonstrated considerable perform...
research
07/23/2021

SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

In this work, we propose a novel self-attention based neural network for...
research
09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...
research
06/28/2023

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promisin...

Please sign up or login with your details

Forgot password? Click here to reset