Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

10/15/2019
by   Salar Jafarlou, et al.
0

Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy reverberant environments. Convolutional Neural Networks (CNNs) have been successfully used to achieve substantial improvements in many speech processing applications including distant speech recognition (DSR). However, standard CNN architectures were not efficient in capturing long-term speech dynamics, which are essential in the design of a robust DSR system. In the present study, we address this issue by investigating variants of large receptive field CNNs (LRF-CNNs) which include deeply recursive networks, dilated convolutional neural networks, and stacked hourglass networks. To compare the efficacy of the aforementioned architectures with the standard CNN for Wall Street Journal (WSJ) corpus, we use a hybrid DNN-HMM based speech recognition system. We extend the study to evaluate the system performances for distant speech simulated using realistic room impulse responses (RIRs). Our experiments show that with fixed number of parameters across all architectures, the large receptive field networks show consistent improvements over the standard CNNs for distant speech. Amongst the explored LRF-CNNs, stacked hourglass network has shown improvements with a 8.9 reduction in word error rate (WER) and 10.7 accuracy compared to the standard CNNs for distant simulated speech signals.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2018

Densely Connected Convolutional Networks for Speech Recognition

This paper presents our latest investigation on Densely Connected Convol...
research
11/23/2018

Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

Frequency modulation features capture the fine structure of speech forma...
research
05/18/2020

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Despite the significant progress in automatic speech recognition (ASR), ...
research
09/29/2015

Very Deep Multilingual Convolutional Neural Networks for LVCSR

Convolutional neural networks (CNNs) are a standard component of many cu...
research
12/17/2017

Deep Learning for Distant Speech Recognition

Deep learning is an emerging technology that is considered one of the mo...
research
12/16/2016

Delta Networks for Optimized Recurrent Network Computation

Many neural networks exhibit stability in their activation patterns over...
research
03/24/2022

Computing Optimal Location of Microphone for Improved Speech Recognition

It was shown in our earlier work that the measurement error in the micro...

Please sign up or login with your details

Forgot password? Click here to reset