Reducing state updates via Gaussian-gated LSTMs

01/22/2019
by   Matthew Thornton, et al.
8

Recurrent neural networks can be difficult to train on long sequence data due to the well-known vanishing gradient problem. Some architectures incorporate methods to reduce RNN state updates, therefore allowing the network to preserve memory over long temporal intervals. To address these problems of convergence, this paper proposes a timing-gated LSTM RNN model, called the Gaussian-gated LSTM (g-LSTM). The time gate controls when a neuron can be updated during training, enabling longer memory persistence and better error-gradient flow. This model captures long-temporal dependencies better than an LSTM and the time gate parameters can be learned even from non-optimal initialization values. Because the time gate limits the updates of the neuron state, the number of computes needed for the network update is also reduced. By adding a computational budget term to the training loss, we can obtain a network which further reduces the number of computes by at least 10x. Finally, by employing a temporal curriculum learning schedule for the g-LSTM, we can reduce the convergence time of the equivalent LSTM network on long sequences.

READ FULL TEXT

page 6

page 7

page 12

page 13

page 15

research
04/11/2016

Deep Gate Recurrent Neural Network

This paper introduces two recurrent neural network structures called Sim...
research
11/05/2021

Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation

Recurrent neural networks with a gating mechanism such as an LSTM or GRU...
research
01/11/2016

Investigating gated recurrent neural networks for speech synthesis

Recently, recurrent neural networks (RNNs) as powerful sequence models h...
research
10/06/2018

h-detach: Modifying the LSTM Gradient Towards Better Optimization

Recurrent neural networks are known for their notorious exploding and va...
research
10/06/2017

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

Recurrent neural networks have shown remarkable success in modeling sequ...
research
10/04/2022

Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks

Gate functions in recurrent models, such as an LSTM and GRU, play a cent...
research
10/16/2018

Reduced-Gate Convolutional LSTM Using Predictive Coding for Spatiotemporal Prediction

Spatiotemporal sequence prediction is an important problem in deep learn...

Please sign up or login with your details

Forgot password? Click here to reset