Memformer: The Memory-Augmented Transformer

by   Qingyang Wu, et al.

Transformer models have obtained remarkable accomplishments in various NLP tasks. However, these models have efficiency issues on long sequences, as the complexity of their self-attention module scales quadratically with the sequence length. To remedy the limitation, we present Memformer, a novel language model that utilizes a single unified memory to encode and retrieve past information. It includes a new optimization scheme, Memory Replay Back-Propagation, which promotes long-range back-propagation through time with a significantly reduced memory requirement. Memformer achieves 𝒪(n) time complexity and 𝒪(1) space complexity in processing long sequences, meaning that the model can handle an infinite length sequence during inference. Our model is also compatible with other self-supervised tasks to further improve the performance on language modeling. Experimental results show that Memformer outperforms the previous long-range sequence models on WikiText-103, including Transformer-XL and compressive Transformer.


page 1

page 2

page 3

page 4


Compressive Transformers for Long-Range Sequence Modelling

We present the Compressive Transformer, an attentive sequence model whic...

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

In this paper, we propose Dynamic Compressive Transformer (DCT), a trans...

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the...

Do Transformers Need Deep Long-Range Memory

Deep attention models have advanced the modelling of sequential data acr...

Updater-Extractor Architecture for Inductive World State Representations

Developing NLP models traditionally involves two stages - training and a...

Linearizing Transformer with Key-Value Memory Bank

Transformer has brought great success to a wide range of natural languag...


Although the fully-connected attention-based model Transformer has achie...