Compressive Transformers for Long-Range Sequence Modelling

11/13/2019
by   Jack W Rae, et al.
0

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2020

Do Transformers Need Deep Long-Range Memory

Deep attention models have advanced the modelling of sequential data acr...
research
10/14/2020

Memformer: The Memory-Augmented Transformer

Transformer models have obtained remarkable accomplishments in various N...
research
12/11/2017

Long-Range Correlation Underlying Childhood Language and Generative Models

Long-range correlation, a property of time series exhibiting long-term m...
research
10/10/2021

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

In this paper, we propose Dynamic Compressive Transformer (DCT), a trans...
research
06/08/2023

Decision S4: Efficient Sequence-Based RL via State Spaces Layers

Recently, sequence learning methods have been applied to the problem of ...
research
04/06/2020

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Attention is a commonly used mechanism in sequence processing, but it is...

Please sign up or login with your details

Forgot password? Click here to reset