Recurrent Memory Transformer

07/14/2022
by   Aydar Bulatov, et al.
0

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then Transformer is trained to control both memory operations and sequence representations processing. Results of experiments show that our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve it performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many ...
research
04/19/2023

Scaling Transformer to 1M tokens and beyond with RMT

This technical report presents the application of a recurrent memory to ...
research
04/17/2020

ETC: Encoding Long and Structured Data in Transformers

Transformer-based models have pushed the state of the art in many natura...
research
06/15/2023

Recurrent Memory Decision Transformer

Transformative models, originally developed for natural language problem...
research
03/15/2022

Efficient Long Sequence Encoding via Synchronization

Pre-trained Transformer models have achieved successes in a wide range o...
research
12/30/2021

ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer

The analysis of long sequence data remains challenging in many real-worl...
research
03/11/2022

Block-Recurrent Transformers

We introduce the Block-Recurrent Transformer, which applies a transforme...

Please sign up or login with your details

Forgot password? Click here to reset