Improving the Gating Mechanism of Recurrent Neural Networks

by   Albert Gu, et al.

Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.


page 1

page 2

page 3

page 4


Can recurrent neural networks warp time?

Successful recurrent models such as long short-term memories (LSTMs) and...

Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower Information Decay

Sequential information contains short- to long-range dependencies; howev...

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Recurrent neural network (RNN) has been widely studied in sequence learn...

Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation

Recurrent neural networks with a gating mechanism such as an LSTM or GRU...

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Gated recurrent neural networks have achieved remarkable results in the ...

A Novel Update Mechanism for Q-Networks Based On Extreme Learning Machines

Reinforcement learning is a popular machine learning paradigm which can ...

ReLU and Addition-based Gated RNN

We replace the multiplication and sigmoid function of the conventional r...

Code Repositories


Unofficial UR-LSTM implementation in Pytorch

view repo

Please sign up or login with your details

Forgot password? Click here to reset