Building DNN Acoustic Models for Large Vocabulary Speech Recognition

06/30/2014
by   Andrew L. Maas, et al.
0

Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.

READ FULL TEXT
research
10/18/2016

Small-footprint Highway Deep Neural Networks for Speech Recognition

State-of-the-art speech recognition systems typically employ neural netw...
research
08/17/2016

Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Distant speech recognition is a challenge, particularly due to the corru...
research
06/19/2016

Graph based manifold regularized deep neural networks for automatic speech recognition

Deep neural networks (DNNs) have been successfully applied to a wide var...
research
12/21/2018

End-to-End Classification of Reverberant Rooms using DNNs

Reverberation is present in our workplaces, our homes and even in places...
research
03/18/2016

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

We study large-scale kernel methods for acoustic modeling and compare to...
research
01/13/2017

Kernel Approximation Methods for Speech Recognition

We study large-scale kernel methods for acoustic modeling in speech reco...
research
02/20/2021

The Use of Voice Source Features for Sung Speech Recognition

In this paper, we ask whether vocal source features (pitch, shimmer, jit...

Please sign up or login with your details

Forgot password? Click here to reset