MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

04/17/2021
by   Xiyun Li, et al.
0

Recently, our proposed recurrent neural network (RNN) based all deep learning minimum variance distortionless response (ADL-MVDR) beamformer method yielded superior performance over the conventional MVDR by replacing the matrix inversion and eigenvalue decomposition with two recurrent neural networks. In this work, we present a self-attentive RNN beamformer to further improve our previous RNN-based beamformer by leveraging on the powerful modeling capability of self-attention. Temporal-spatial self-attention module is proposed to better learn the beamforming weights from the speech and noise spatial covariance matrices. The temporal self-attention module could help RNN to learn global statistics of covariance matrices. The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices. Furthermore, a multi-channel input with multi-speaker directional features and multi-speaker speech separation outputs (MIMO) model is developed to improve the inference efficiency. The evaluations demonstrate that our proposed MIMO self-attentive RNN beamformer improves both the automatic speech recognition (ASR) accuracy and the perceptual estimation of speech quality (PESQ) against prior arts.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

11/09/2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer

Acoustic echo cancellation (AEC) is a technique used in full-duplex comm...
01/04/2021

Generalized RNN beamformer for target speech separation

Recently we proposed an all-deep-learning minimum variance distortionles...
09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
07/23/2021

SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

In this work, we propose a novel self-attention based neural network for...
10/20/2021

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

In this work, we propose a new model called triple-path attentive recurr...
04/28/2020

Neural Speech Separation Using Spatially Distributed Microphones

This paper proposes a neural network based speech separation method usin...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.