Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement

10/01/2021
by   Zhong-Qiu Wang, et al.
0

A promising approach for multi-microphone speech separation involves two deep neural networks (DNN), where the predicted target speech from the first DNN is used to compute signal statistics for time-invariant minimum variance distortionless response (MVDR) beamforming, and the MVDR result is then used as extra features for the second DNN to predict target speech. Previous studies suggested that the MVDR result can provide complementary information for the second DNN to better predict target speech. However, on fixed-geometry arrays, both DNNs can take in, for example, the real and imaginary (RI) components of the multi-channel mixture as features to leverage the spatial and spectral information for enhancement. It is not explained clearly why the linear MVDR result can be complementary and why it is still needed, considering that the DNNs and the beamformer use the same input, and the DNNs perform non-linear filtering and could render the linear filtering of MVDR unnecessary. Similarly, in monaural cases, one can replace the MVDR beamformer with a monaural weighted prediction error (WPE) filter. Although the linear WPE filter and the DNNs use the same mixture RI components as input, the WPE result is found to significantly improve the second DNN. This study provides a novel explanation from the perspective of the low-distortion nature of such algorithms, and finds that they can consistently improve phase estimation. Equipped with this understanding, we investigate several low-distortion target estimation algorithms including several beamformers, WPE, forward convolutive prediction, and their combinations, and use their results as extra features to train the second network to achieve better enhancement. Evaluation results on single- and multi-microphone speech dereverberation and enhancement tasks indicate the effectiveness of the proposed approach, and the validity of the proposed view.

READ FULL TEXT

page 1

page 2

page 8

page 12

research
02/24/2022

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, whi...
research
04/21/2022

STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency

Deep learning based speech enhancement in the short-term Fourier transfo...
research
06/22/2022

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

Employing deep neural networks (DNNs) to directly learn filters for mult...
research
06/27/2022

Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement

The key advantage of using multiple microphones for speech enhancement i...
research
08/16/2021

Convolutive Prediction for Reverberant Speech Separation

We investigate the effectiveness of convolutive prediction, a novel form...
research
03/04/2020

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

This study proposes a multi-microphone complex spectral mapping approach...
research
03/14/2022

TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

While existing end-to-end beamformers achieve impressive performance in ...

Please sign up or login with your details

Forgot password? Click here to reset