Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

03/31/2022
by   Keyu An, et al.
0

Recently, the end-to-end training approach for multi-channel ASR has shown its effectiveness, which usually consists of a beamforming front-end and a recognition back-end. However, the end-to-end training becomes more difficult due to the integration of multiple modules, particularly considering that multi-channel speech data recorded in real environments are limited in size. This raises the demand to exploit the single-channel data for multi-channel end-to-end ASR. In this paper, we systematically compare the performance of three schemes to exploit external single-channel data for multi-channel end-to-end ASR, namely back-end pre-training, data scheduling, and data simulation, under different settings such as the sizes of the single-channel data and the choices of the front-end. Extensive experiments on CHiME-4 and AISHELL-4 datasets demonstrate that while all three methods improve the multi-channel end-to-end speech recognition performance, data simulation outperforms the other two, at the cost of longer training time. Data scheduling outperforms back-end pre-training marginally but nearly consistently, presumably because that in the pre-training stage, the back-end tends to overfit on the single-channel data, especially when the single-channel data size is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2022

End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation

Self-supervised learning representation (SSLR) has demonstrated its sign...
research
02/23/2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

Recently, the end-to-end approach has been successfully applied to multi...
research
08/30/2021

Multi-Channel Transformer Transducer for Speech Recognition

Multi-channel inputs offer several advantages over single-channel, to im...
research
03/31/2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single di...
research
10/07/2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Due to the high performance of multi-channel speech processing, we can u...
research
09/23/2021

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

End-to-end (E2E) multi-channel ASR systems show state-of-the-art perform...
research
03/25/2022

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening w...

Please sign up or login with your details

Forgot password? Click here to reset