Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

12/24/2020
by   Zhuohuang Zhang, et al.
0

Many purely neural network based speech separation approaches have been proposed that greatly improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to automatic speech recognition (ASR). Minimum variance distortionless response (MVDR) filters strive to remove nonlinear distortions, however, these approaches either are not optimal for removing residual (linear) noise, or they are unstable when used jointly with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The MCMF ADL-MVDR handles different numbers of microphone channels in one framework, where it addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed system is evaluated using a Mandarin audio-visual corpora and is compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed framework under different scenarios and across several objective evaluation metrics, including ASR performance.

READ FULL TEXT

page 1

page 8

research
05/08/2020

Neural Spatio-Temporal Beamformer for Target Speech Separation

Purely neural network (NN) based speech separation and enhancement metho...
research
11/01/2022

A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings

Speaker-attributed automatic speech recognition (SA-ASR) in multiparty m...
research
04/05/2022

Audio-visual multi-channel speech separation, dereverberation and recognition

Despite the rapid advance of automatic speech recognition (ASR) technolo...
research
11/22/2021

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

Automatic speech recognition (ASR) of multi-channel multi-speaker overla...
research
11/22/2022

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

While current deep learning (DL)-based beamforming techniques have been ...
research
08/24/2023

MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario

The design of automatic speech pronunciation assessment can be categoriz...
research
05/18/2022

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

To improve speech intelligibility and speech quality in noisy environmen...

Please sign up or login with your details

Forgot password? Click here to reset