Multi-Channel Automatic Speech Recognition Using Deep Complex Unet

by   Yuxiang Kong, et al.

The front-end module in multi-channel automatic speech recognition (ASR) systems mainly use microphone array techniques to produce enhanced signals in noisy conditions with reverberation and echos. Recently, neural network (NN) based front-end has shown promising improvement over the conventional signal processing methods. In this paper, we propose to adopt the architecture of deep complex Unet (DCUnet) - a powerful complex-valued Unet-structured speech enhancement model - as the front-end of the multi-channel acoustic model, and integrate them in a multi-task learning (MTL) framework along with cascaded framework for comparison. Meanwhile, we investigate the proposed methods with several training strategies to improve the recognition accuracy on the 1000-hours real-world XiaoMi smart speaker data with echos. Experiments show that our proposed DCUnet-MTL method brings about 12.2 rate (CER) reduction compared with the traditional approach with array processing plus single-channel acoustic model. It also achieves superior performance than the recently proposed neural beamforming method.



There are no comments yet.


page 1

page 2

page 3

page 4


Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typica...

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

Automatic speech recognition (ASR) in the cloud allows the use of larger...

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Compensation for channel mismatch and noise interference is essential fo...

Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting

In this paper, we study several microphone channel selection and weighti...

Neural Network-based Virtual Microphone Estimator

Developing microphone array technologies for a small number of microphon...

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far...

Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

Integration of multiple microphone data is one of the key ways to achiev...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.