Long Timescale Credit Assignment in NeuralNetworks with External Memory

01/14/2017
by   Steven Stenberg Hansen, et al.
0

Credit assignment in traditional recurrent neural networks usually involves back-propagating through a long chain of tied weight matrices. The length of this chain scales linearly with the number of time-steps as the same network is run at each time-step. This creates many problems, such as vanishing gradients, that have been well studied. In contrast, a NNEM's architecture recurrent activity doesn't involve a long chain of activity (though some architectures such as the NTM do utilize a traditional recurrent architecture as a controller). Rather, the externally stored embedding vectors are used at each time-step, but no messages are passed from previous time-steps. This means that vanishing gradients aren't a problem, as all of the necessary gradient paths are short. However, these paths are extremely numerous (one per embedding vector in memory) and reused for a very long time (until it leaves the memory). Thus, the forward-pass information of each memory must be stored for the entire duration of the memory. This is problematic as this additional storage far surpasses that of the actual memories, to the extent that large memories on infeasible to back-propagate through in high dimensional settings. One way to get around the need to hold onto forward-pass information is to recalculate the forward-pass whenever gradient information is available. However, if the observations are too large to store in the domain of interest, direct reinstatement of a forward pass cannot occur. Instead, we rely on a learned autoencoder to reinstate the observation, and then use the embedding network to recalculate the forward-pass. Since the recalculated embedding vector is unlikely to perfectly match the one stored in memory, we try out 2 approximations to utilize error gradient w.r.t. the vector in memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

High Order Recurrent Neural Networks for Acoustic Modelling

Vanishing long-term gradients are a major issue in training standard rec...
research
01/09/2020

Online Memorization of Random Firing Sequences by a Recurrent Neural Network

This paper studies the capability of a recurrent neural network model to...
research
03/09/2021

Scalable Online Recurrent Learning Using Columnar Neural Networks

Structural credit assignment for recurrent learning is challenging. An a...
research
02/27/2019

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Residual neural networks can be viewed as the forward Euler discretizati...
research
05/13/2018

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

Reinforcement learning (RL) agents performing complex tasks must be able...
research
08/18/2016

Decoupled Neural Interfaces using Synthetic Gradients

Training directed neural networks typically requires forward-propagating...
research
11/05/2010

Gradient Computation In Linear-Chain Conditional Random Fields Using The Entropy Message Passing Algorithm

The paper proposes a numerically stable recursive algorithm for the exac...

Please sign up or login with your details

Forgot password? Click here to reset