Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

10/09/2022
by   Huanru Henry Mao, et al.
0

Autoregressive Transformers are strong language models but incur O(T) complexity during per-token generation due to the self-attention mechanism. Recent work proposes kernel-based methods to approximate causal self-attention by replacing it with recurrent formulations with various update rules and feature maps to achieve O(1) time and memory complexity. We explore these approaches and find that they are unnecessarily complex, and propose a simple alternative - decaying fast weights - that runs fast on GPU, outperforms prior methods, and retains 99 competitive performance on WikiText-103 against more complex attention substitutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Transformers achieve remarkable performance in several tasks but due to ...
research
02/22/2021

Linear Transformers Are Secretly Fast Weight Memory Systems

We show the formal equivalence of linearised self-attention mechanisms a...
research
04/19/2022

On the Locality of Attention in Direct Speech Translation

Transformers have achieved state-of-the-art results across multiple NLP ...
research
06/22/2023

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Transformer models have been widely adopted in various domains over the ...
research
05/17/2022

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Vision transformers using self-attention or its proposed alternatives ha...
research
01/20/2021

Classifying Scientific Publications with BERT – Is Self-Attention a Feature Selection Method?

We investigate the self-attention mechanism of BERT in a fine-tuning sce...
research
03/01/2021

OmniNet: Omnidirectional Representations from Transformers

This paper proposes Omnidirectional Representations from Transformers (O...

Please sign up or login with your details

Forgot password? Click here to reset