DeepAI AI Chat
Log In Sign Up

Robust Multi-channel Speech Recognition using Frequency Aligned Network

by   Taejin Park, et al.

Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18


page 1

page 2

page 3

page 4


Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typica...

Trainable Frontend For Robust and Far-Field Keyword Spotting

Robust and far-field speech recognition is critical to enable true hands...

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far...

Mask scalar prediction for improving robust automatic speech recognition

Using neural network based acoustic frontends for improving robustness o...

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

Conventional Frequency Domain Linear Prediction (FDLP) technique models ...

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

In this paper, we introduce spatial attention for refining the informati...