Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

07/24/2015
by   Hasim Sak, et al.
0

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2014

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) archit...
research
03/25/2016

On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition

We study the problem of compressing recurrent neural networks (RNNs). In...
research
06/22/2016

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

We present a comprehensive study of deep bidirectional long short-term m...
research
05/17/2017

Frame Stacking and Retaining for Recurrent Neural Network Acoustic Model

Frame stacking is broadly applied in end-to-end neural network training ...
research
07/12/2018

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Recently, recurrent neural networks have become state-of-the-art in acou...
research
06/20/2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Acoustic models based on long short-term memory recurrent neural network...
research
10/30/2015

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

In this paper, we investigate the use of prediction-adaptation-correctio...

Please sign up or login with your details

Forgot password? Click here to reset