End-To-End Speech Recognition Using A High Rank LSTM-CTC Based Model

03/12/2019
by   Yangyang Shi, et al.
0

Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck projection matrix maps the hidden feature vectors obtained from LSTM to softmax output layer. In this paper, we propose to use a high rank projection layer to replace the projection matrix. The output from the high rank projection layer is a weighted combination of vectors that are projected from the hidden feature vectors via different projection matrices and non-linear activation function. The high rank projection layer is able to improve the expressiveness of LSTM-CTC models. The experimental results show that on Wall Street Journal (WSJ) corpus and LibriSpeech data set, the proposed method achieves 4 baseline CTC system. They outperform other published CTC based end-to-end (E2E) models under the condition that no external data or data augmentation is applied. Code has been made available at https://github.com/mobvoi/lstm_ctc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2017

Improved Regularization Techniques for End-to-End Speech Recognition

Regularization is important for end-to-end speech models, since the mode...
research
08/14/2019

State-of-the-art Speech Recognition using EEG and Towards Decoding of Speech Spectrum From EEG

In this paper we first demonstrate continuous noisy speech recognition u...
research
05/08/2018

Improved training of end-to-end attention models for speech recognition

Sequence-to-sequence attention-based models on subword units allow simpl...
research
04/30/2019

Very Deep Self-Attention Networks for End-to-End Speech Recognition

Recently, end-to-end sequence-to-sequence models for speech recognition ...
research
01/20/2017

End-To-End Visual Speech Recognition With LSTMs

Traditional visual speech recognition systems consist of two stages, fea...
research
03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...
research
06/18/2018

Semi-tied Units for Efficient Gating in LSTM and Highway Networks

Gating is a key technique used for integrating information from multiple...

Please sign up or login with your details

Forgot password? Click here to reset