Augmenting Self-attention with Persistent Memory

07/02/2019
by   Sainbayar Sukhbaatar, et al.
6

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer. Thanks to these vectors, we can remove the feed-forward layer without degrading the performance of a transformer. Our evaluation shows the benefits brought by our model on standard character and word level language modeling benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2021

Revisiting Simple Neural Probabilistic Language Models

Recent progress in language modeling has been driven not only by advance...
research
08/09/2018

Character-Level Language Modeling with Deeper Self-Attention

LSTMs and other RNN variants have shown strong performance on character-...
research
06/06/2019

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

The Transformer architecture is widely used in natural language processi...
research
05/29/2023

Brainformers: Trading Simplicity for Efficiency

Transformers are central to recent successes in natural language process...
research
03/02/2023

Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention

Recently, a considerable number of studies in computer vision involves d...
research
09/09/2020

Pay Attention when Required

Transformer-based models consist of interleaved feed-forward blocks - th...
research
10/22/2020

Not all parameters are born equal: Attention is mostly what you need

Transformers are widely used in state-of-the-art machine translation, bu...

Please sign up or login with your details

Forgot password? Click here to reset