A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

06/22/2016
by   Albert Zeyer, et al.
0

We present a comprehensive study of deep bidirectional long short-term memory (LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR). We study the effect of size and depth and train models of up to 8 layers. We investigate the training aspect and study different variants of optimization methods, batching, truncated backpropagation, different regularization techniques such as dropout and L_2 regularization, and different gradient clipping variants. The major part of the experimental analysis was performed on the Quaero corpus. Additional experiments also were performed on the Switchboard corpus. Our best LSTM model has a relative improvement in word error rate of over 14% compared to our best feed-forward neural network (FFNN) baseline on the Quaero task. On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs. Finally we compare the training calculation time of many of the presented experiments in relation with recognition performance. All the experiments were done with RETURNN, the RWTH extensible training framework for universal recurrent neural networks in combination with RASR, the RWTH ASR toolkit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2015

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent...
research
07/03/2017

Improving LSTM-CTC based ASR performance in domains with limited training data

This paper addresses the observed performance gap between automatic spee...
research
11/20/2020

Improving RNN-T ASR Accuracy Using Untranscribed Context Audio

We present a new training scheme for streaming automatic speech recognit...
research
08/07/2018

Device-directed Utterance Detection

In this work, we propose a classifier for distinguishing device-directed...
research
05/25/2022

Heterogeneous Reservoir Computing Models for Persian Speech Recognition

Over the last decade, deep-learning methods have been gradually incorpor...
research
08/02/2016

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

In this work we release our extensible and easily configurable neural ne...
research
03/21/2017

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LST...

Please sign up or login with your details

Forgot password? Click here to reset