∞-former: Infinite Memory Transformer

09/01/2021
by   Pedro Henrique Martins, et al.
16

Transformers struggle when attending to long contexts, since the amount of computation grows with the context length, and therefore they cannot model long-term memories effectively. Several variations have been proposed to alleviate this problem, but they all have a finite memory capacity, being forced to drop old information. In this paper, we propose the ∞-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the ∞-former's attention complexity becomes independent of the context length. Thus, it is able to model arbitrarily long contexts and maintain "sticky memories" while keeping a fixed computation budget. Experiments on a synthetic sorting task demonstrate the ability of the ∞-former to retain information from long sequences. We also perform experiments on language modeling, by training a model from scratch and by fine-tuning a pre-trained language model, which show benefits of unbounded long-term memories.

READ FULL TEXT
research
02/04/2021

Adaptive Semiparametric Language Models

We present a language model that combines a large parametric neural netw...
research
04/15/2022

LaMemo: Language Modeling with Look-Ahead Memory

Although Transformers with fully connected self-attentions are powerful ...
research
07/05/2023

Facing off World Model Backbones: RNNs, Transformers, and S4

World models are a fundamental component in model-based reinforcement le...
research
05/25/2023

Landmark Attention: Random-Access Infinite Context Length for Transformers

While transformers have shown remarkable success in natural language pro...
research
04/26/2023

Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

Large-scale Language Models (LLMs) are constrained by their inability to...
research
01/09/2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer networks have a potential of learning longer-term dependency...
research
10/20/2016

A Growing Long-term Episodic & Semantic Memory

The long-term memory of most connectionist systems lies entirely in the ...

Please sign up or login with your details

Forgot password? Click here to reset