Neural Spatio-Temporal Beamformer for Target Speech Separation

05/08/2020
by   Yong Xu, et al.
0

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR). On the other hand, the minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, although can significantly reduce speech distortions, has limited noise reduction capability. In this paper, we propose a multi-tap MVDR beamformer with complex-valued masks for speech separation and enhancement. Compared to the state-of-the-art NN-mask based MVDR beamformer, the multi-tap MVDR beamformer exploits the inter-frame correlation in addition to the inter-microphone correlation that is already utilized in prior arts. Further improvements include the replacement of the real-valued masks with the complex-valued masks and the joint training of the complex-mask NN. The evaluation on our multi-modal multi-channel target speech separation and enhancement platform demonstrates that our proposed multi-tap MVDR beamformer improves both the ASR accuracy and the perceptual speech quality against prior arts.

READ FULL TEXT
research
12/24/2020

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

Many purely neural network based speech separation approaches have been ...
research
07/23/2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integratio...
research
10/23/2019

Filterbank design for end-to-end speech separation

Single-channel speech separation has recently made great progress thanks...
research
11/04/2020

DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

In this paper, we propose a multi-channel network for simultaneous speec...
research
07/06/2023

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Accurate recognition of cocktail party speech containing overlapping spe...
research
11/20/2018

Differentiable Consistency Constraints for Improved Deep Speech Enhancement

In recent years, deep networks have led to dramatic improvements in spee...
research
11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset