On the Implicit Bias of Gradient Descent for Temporal Extrapolation

02/09/2022
by   Edo Cohen-Karlik, et al.
0

Common practice when using recurrent neural networks (RNNs) is to apply a model to sequences longer than those seen in training. This "extrapolating" usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple case where the data generating distribution is memoryless. We first show that even with infinite training data, there exist RNN models that interpolate perfectly (i.e., they fit the training data) yet extrapolate poorly to longer sequences. We then show that if gradient descent is used for training, learning will converge to perfect extrapolation under certain assumption on initialization. Our results complement recent studies on the implicit bias of gradient descent, showing that it plays a key role in extrapolation when learning temporal prediction models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network

Overparameterization in deep learning typically refers to settings where...
research
07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...
research
06/14/2016

Recurrent neural network training with preconditioned stochastic gradient descent

This paper studies the performance of a recently proposed preconditioned...
research
06/12/2020

Implicit bias of gradient descent for mean squared error regression with wide neural networks

We investigate gradient descent training of wide neural networks and the...
research
10/08/2021

On the Implicit Biases of Architecture Gradient Descent

Do neural networks generalise because of bias in the functions returned ...
research
12/05/2020

A Review of Designs and Applications of Echo State Networks

Recurrent Neural Networks (RNNs) have demonstrated their outstanding abi...
research
02/18/2018

Memorize or generalize? Searching for a compositional RNN in a haystack

Neural networks are very powerful learning systems, but they do not read...

Please sign up or login with your details

Forgot password? Click here to reset