DeepAI AI Chat
Log In Sign Up

Robust Multi-channel Speech Recognition using Frequency Aligned Network

02/06/2020
by   Taejin Park, et al.
0

Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18

READ FULL TEXT

page 1

page 2

page 3

page 4

03/13/2019

Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typica...
07/19/2016

Trainable Frontend For Robust and Far-Field Keyword Spotting

Robust and far-field speech recognition is critical to enable true hands...
03/13/2019

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far...
04/26/2022

Mask scalar prediction for improving robust automatic speech recognition

Using neural network based acoustic frontends for improving robustness o...
03/24/2022

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

Conventional Frequency Domain Linear Prediction (FDLP) technique models ...
11/05/2019

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

In this paper, we introduce spatial attention for refining the informati...