The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

06/03/2021
by   Ulme Wennberg, et al.
0

Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

READ FULL TEXT

page 3

page 9

page 11

research
06/10/2021

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

In this paper, we detail the relationship between convolutions and self-...
research
05/12/2020

AttViz: Online exploration of self-attention for transparent neural language modeling

Neural language models are becoming the prevailing methodology for the t...
research
05/20/2022

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Relative positional embeddings (RPE) have received considerable attentio...
research
04/20/2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding in transformer architecture provides supervision for d...
research
05/23/2022

Outliers Dimensions that Disrupt Transformers Are Driven by Frequency

Transformer-based language models are known to display anisotropic behav...
research
06/27/2023

Extending Context Window of Large Language Models via Positional Interpolation

We present Position Interpolation (PI) that extends the context window s...
research
04/18/2022

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine transl...

Please sign up or login with your details

Forgot password? Click here to reset