Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

07/27/2022
by   Artyom Sorokin, et al.
0

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires prohibitively large computing memory if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can be potentially applied to any gradient based sequence learning. MemUP implementation for recurrent architectures shows performances better or comparable to baselines while requiring significantly less computing memory.

READ FULL TEXT

page 6

page 8

research
02/15/2017

Generative Temporal Models with Memory

We consider the general problem of modeling temporal data with long-rang...
research
01/22/2019

Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies

Modelling long-term dependencies is a challenge for recurrent neural net...
research
07/21/2023

Improve Long-term Memory Learning Through Rescaling the Error Temporally

This paper studies the error metric selection for long-term memory learn...
research
09/11/2018

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires ...
research
04/28/2021

Optimizing Rescoring Rules with Interpretable Representations of Long-Term Information

Analyzing temporal data (e.g., wearable device data) requires a decision...
research
11/07/2017

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

A major drawback of backpropagation through time (BPTT) is the difficult...
research
05/23/2019

Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs

Despite recent innovations in network architectures and loss functions, ...

Please sign up or login with your details

Forgot password? Click here to reset