Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks

06/14/2018
by   Minmin Chen, et al.
0

Recurrent neural networks have gained widespread use in modeling sequence data across various domains. While many successful recurrent architectures employ a notion of gating, the exact mechanism that enables such remarkable performance is not well understood. We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and random matrix theory. To simplify our discussion, we introduce a new RNN cell with a simple gating mechanism that we call the minimalRNN and compare it with vanilla RNNs. Our theory allows us to define a maximum timescale over which RNNs can remember an input. We show that this theory predicts trainability for both recurrent architectures. We show that gated recurrent networks feature a much broader, more robust, trainable region than vanilla RNNs, which corroborates recent experimental findings. Finally, we develop a closed-form critical initialization scheme that achieves dynamical isometry in both vanilla RNNs and minimalRNNs. We show that this results in significantly improvement in training dynamics. Finally, we demonstrate that the minimalRNN achieves comparable performance to its more complex counterparts, such as LSTMs or GRUs, on a language modeling task.

READ FULL TEXT
research
01/25/2019

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Training recurrent neural networks (RNNs) on long sequence tasks is plag...
research
07/29/2020

Theory of gating in recurrent neural networks

RNNs are popular dynamical models, used for processing sequential data. ...
research
01/31/2020

Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

Recurrent neural networks (RNNs) are powerful dynamical models for data ...
research
06/14/2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

In recent years, state-of-the-art methods in computer vision have utiliz...
research
05/03/2020

Teaching Recurrent Neural Networks to Modify Chaotic Memories by Example

The ability to store and manipulate information is a hallmark of computa...
research
03/02/2021

On the Memory Mechanism of Tensor-Power Recurrent Models

Tensor-power (TP) recurrent model is a family of non-linear dynamical sy...
research
09/03/2023

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

Traveling waves of neural activity have been observed throughout the bra...

Please sign up or login with your details

Forgot password? Click here to reset