Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs

02/18/2018
by   Florian Kreyssig, et al.
0

Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the model can be attributed to its ability to effectively model long temporal contexts. However, current TDNN models are relatively shallow, which limits the modelling capability. This paper proposes a method of increasing the network depth by deepening the kernel used in the TDNN temporal convolutions. The best performing kernel consists of three fully connected layers with a residual (ResNet) connection from the output of the first to the output of the third. The addition of spectro-temporal processing as the input to the TDNN in the form of a convolutional neural network (CNN) and a newly designed Grid-RNN was investigated. The Grid-RNN strongly outperforms a CNN if different sets of parameters for different frequency bands are used and can be further enhanced by using a bi-directional Grid-RNN. Experiments using the multi-genre broadcast (MGB3) English data (275h) show that deep kernel TDNNs reduces the word error rate (WER) by 6 Grid-RNN gives a relative WER reduction of 9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2016

Learning text representation using recurrent convolutional neural network with highway layers

Recently, the rapid development of word embedding and neural networks ha...
research
08/22/2021

Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers

Recurrent neural network transducers (RNN-T) are a promising end-to-end ...
research
11/09/2016

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audio-visual automa...
research
05/04/2023

Employing Hybrid Deep Neural Networks on Dari Speech

This paper is an extension of our previous conference paper. In recent y...
research
02/03/2021

Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy

Inspired by the progress of the End-to-End approach [1], this paper syst...
research
12/29/2017

The CAPIO 2017 Conversational Speech Recognition System

In this paper we show how we have achieved the state-of-the-art performa...

Please sign up or login with your details

Forgot password? Click here to reset