Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network

10/25/2022
by   Edo Cohen-Karlik, et al.
0

Overparameterization in deep learning typically refers to settings where a trained Neural Network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize well for sequence lengths seen in training, some of which extrapolate to longer sequences, while others do not. Numerous works studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with solutions that generalize well. On the other hand, its tendency to fit overparameterized RNNs with solutions that extrapolate has been discovered only lately, and is far less understood. In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. In contrast to recent arguments suggesting an implicit bias towards short-term memory, we provide theoretical evidence for learning low dimensional state spaces, which can also model long-term memory. Our result relies on a dynamical characterization which shows that GD (with small step size and near-zero initialization) strives to maintain a certain form of balancedness, as well as on tools developed in the context of the moment problem from statistics (recovery of a probability distribution from its moments). Experiments corroborate our theory, demonstrating extrapolation via learning low dimensional state spaces with both linear and non-linear RNNs

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

On the Implicit Bias of Gradient Descent for Temporal Extrapolation

Common practice when using recurrent neural networks (RNNs) is to apply ...
research
01/19/2021

Implicit Bias of Linear RNNs

Contemporary wisdom based on empirical studies suggests that standard re...
research
11/05/2020

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Training RNNs to learn long-term dependencies is difficult due to vanish...
research
05/26/2016

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Recurrent neural networks (RNNs) have drawn interest from machine learni...
research
09/16/2020

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

We study the approximation properties and optimization dynamics of recur...
research
10/11/2022

On Scrambling Phenomena for Randomly Initialized Recurrent Networks

Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics...
research
01/09/2020

Online Memorization of Random Firing Sequences by a Recurrent Neural Network

This paper studies the capability of a recurrent neural network model to...

Please sign up or login with your details

Forgot password? Click here to reset