On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

09/16/2020
by   Zhong Li, et al.
1

We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

We prove an inverse approximation theorem for the approximation of nonli...
research
03/02/2021

On the Memory Mechanism of Tensor-Power Recurrent Models

Tensor-power (TP) recurrent model is a family of non-linear dynamical sy...
research
07/20/2021

Approximation Theory of Convolutional Architectures for Time Series Modelling

We study the approximation properties of convolutional architectures app...
research
02/19/2019

Understanding and Controlling Memory in Recurrent Neural Networks

To be effective in sequential data processing, Recurrent Neural Networks...
research
05/06/2021

Metric Entropy Limits on Recurrent Neural Network Learning of Linear Dynamical Systems

One of the most influential results in neural network theory is the univ...
research
10/25/2022

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network

Overparameterization in deep learning typically refers to settings where...
research
05/26/2016

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Recurrent neural networks (RNNs) have drawn interest from machine learni...

Please sign up or login with your details

Forgot password? Click here to reset