First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

08/12/2014
by   Awni Y. Hannun, et al.
0

We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This paper extends this approach in two ways. First, we demonstrate that a straightforward recurrent neural network architecture can achieve a high level of accuracy. Second, we propose and evaluate a modified prefix-search decoding algorithm. This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems. Experiments on the Wall Street Journal corpus demonstrate fairly competitive word error rates, and the importance of bi-directional network recurrence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

We present results that show it is possible to build a competitive, grea...
research
06/09/2020

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Advanced neural network models have penetrated Automatic Speech Recognit...
research
07/23/2020

Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Recurrent Neural Network Language Models (RNNLMs) have started to be use...
research
10/07/2021

Back from the future: bidirectional CTC decoding using future information in speech recognition

In this paper, we propose a simple but effective method to decode the ou...
research
08/02/2016

Efficient Segmental Cascades for Speech Recognition

Discriminative segmental models offer a way to incorporate flexible feat...
research
10/30/2020

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech...
research
04/02/2020

Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

In hybrid HMM based speech recognition, LSTM language models have been w...

Please sign up or login with your details

Forgot password? Click here to reset