End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

02/23/2021
by   Wangyou Zhang, et al.
0

Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions. However, severe performance degradation is still observed in the reverberant and noisy scenarios, and there is still a large performance gap between anechoic and reverberant conditions. In this work, we focus on the multichannel multi-speaker reverberant condition, and propose to extend our previous framework for end-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend subnetworks including voice activity detection like masks. The techniques significantly stabilize the end-to-end training process. The experiments on the spatialized wsj1-2mix corpus show that the proposed system achieves about 35 WER relative reduction compared to our conventional multi-channel E2E ASR system, and also obtains decent speech dereverberation and separation performance (SDR=12.5 dB) in the reverberant multi-speaker condition while trained only with the ASR criterion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition

Recently, the end-to-end approach has proven its efficacy in monaural mu...
research
10/23/2019

Filterbank design for end-to-end speech separation

Single-channel speech separation has recently made great progress thanks...
research
03/31/2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

Recently, the end-to-end training approach for multi-channel ASR has sho...
research
07/07/2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that...
research
05/21/2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming

Despite successful applications of end-to-end approaches in multi-channe...
research
09/07/2020

An End-to-end Architecture of Online Multi-channel Speech Separation

Multi-speaker speech recognition has been one of the keychallenges in co...
research
04/19/2019

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...

Please sign up or login with your details

Forgot password? Click here to reset