Implicit Bias of Linear RNNs

01/19/2021
by   Melikasadat Emami, et al.
9

Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution and hence, shorter memory. The degree of this bias depends on the variance of the transition kernel matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated in both synthetic and real data experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network

Overparameterization in deep learning typically refers to settings where...
research
10/11/2022

On Scrambling Phenomena for Randomly Initialized Recurrent Networks

Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics...
research
11/05/2020

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Training RNNs to learn long-term dependencies is difficult due to vanish...
research
07/27/2023

Fading memory as inductive bias in residual recurrent networks

Residual connections have been proposed as architecture-based inductive ...
research
12/22/2019

Contracting Implicit Recurrent Neural Networks: Stable Models with Improved Trainability

Stability of recurrent models is closely linked with trainability, gener...
research
05/30/2023

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

We prove an inverse approximation theorem for the approximation of nonli...
research
02/27/2019

Alternating Synthetic and Real Gradients for Neural Language Modeling

Training recurrent neural networks (RNNs) with backpropagation through t...

Please sign up or login with your details

Forgot password? Click here to reset