Layer Trajectory LSTM

08/28/2018
by   Jinyu Li, et al.
0

It is popular to stack LSTM layers to get better modeling power, especially when large amount of training data is available. However, an LSTM-RNN with too many vanilla LSTM layers is very hard to train and there still exists the gradient vanishing issue if the network goes too deep. This issue can be partially solved by adding skip connections between layers, such as residual LSTM. In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM. This layer-LSTM scans the outputs from time-LSTMs, and uses the summarized layer trajectory information for final senone classification. The forward-propagation of time-LSTM and layer-LSTM can be handled in two separate threads in parallel so that the network computation time is the same as the standard time-LSTM. With a layer-LSTM running through layers, a gated path is provided from the output layer to the bottom layer, alleviating the gradient vanishing issue. Trained with 30 thousand hours of EN-US Microsoft internal data, the proposed ltLSTM performed significantly better than the standard multi-layer LSTM and residual LSTM, with up to 9.0 reduction across different tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2017

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

In this paper, a novel architecture for a deep recurrent neural network,...
research
07/13/2020

Transformer with Depth-Wise LSTM

Increasing the depth of models allows neural models to model complicated...
research
11/25/2019

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

We propose a new stackable recurrent cell (STAR) for recurrent neural ne...
research
01/21/2020

A hybrid model based on deep LSTM for predicting high-dimensional chaotic systems

We propose a hybrid method combining the deep long short-term memory (LS...
research
06/11/2020

Recurrent Neural Networks for Handover Management in Next-Generation Self-Organized Networks

In this paper, we discuss a handover management scheme for Next Generati...
research
10/03/2022

Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks

Since the recognition in the early nineties of the vanishing/exploding (...
research
02/27/2018

Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks

This paper proposes a combination of a convolutional and a LSTM network ...

Please sign up or login with your details

Forgot password? Click here to reset