Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

10/31/2016
by   Hagen Soltau, et al.
0

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units. We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss. The model is trained on 125,000 hours of semi-supervised acoustic training data, which enables us to alleviate the data sparsity problem for word models. We show that the CTC word models work very well as an end-to-end all-neural speech recognition model without the use of traditional context-dependent sub-word phone units that require a pronunciation lexicon, and without any language model removing the need to decode. We demonstrate that the CTC word models perform better than a strong, more complex, state-of-the-art baseline with sub-word units.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2018

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

We investigate training end-to-end speech recognition models with the re...
research
08/12/2014

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

We present a method to perform first-pass large vocabulary continuous sp...
research
06/15/2016

Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

Phonemic or phonetic sub-word units are the most commonly used atomic el...
research
10/12/2021

Word Order Does Not Matter For Speech Recognition

In this paper, we study training of automatic speech recognition system ...
research
10/30/2020

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech...
research
08/02/2018

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

Speech recognition is a sequence prediction problem. Besides employing v...
research
03/04/2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

In this paper, we present an improved feedforward sequential memory netw...

Please sign up or login with your details

Forgot password? Click here to reset