Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

11/07/2017
by   Nan Rosemary Ke, et al.
0

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2018

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires ...
research
05/25/2023

Online learning of long-range dependencies

Online learning holds the promise of enabling efficient long-term credit...
research
03/26/2021

Backpropagation Through Time For Networks With Long-Term Dependencies

Backpropagation through time (BPTT) is a technique of updating tuned par...
research
07/27/2022

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

In many sequential tasks, a model needs to remember relevant events from...
research
05/23/2017

Unbiasing Truncated Backpropagation Through Time

Truncated Backpropagation Through Time (truncated BPTT) is a widespread ...
research
03/01/2018

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

Despite recent advances in training recurrent neural networks (RNNs), ca...
research
09/08/2016

Learning to learn with backpropagation of Hebbian plasticity

Hebbian plasticity is a powerful principle that allows biological brains...

Please sign up or login with your details

Forgot password? Click here to reset