Improve Transformer Models with Better Relative Position Embeddings

09/28/2020
by   Zhiheng Huang, et al.
0

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a sinusoid embedding is fixed and not learnable. In this paper, we first review absolute position embeddings and existing methods for relative position embeddings. We then propose new techniques that encourage increased interaction between query, key and relative position embeddings in the self-attention mechanism. Our most promising approach is a generalization of the absolute position embedding, improving results on SQuAD1.1 compared to previous position embeddings approaches. In addition, we address the inductive property of whether a position embedding can be robust enough to handle long sequences. We demonstrate empirically that our relative position embedding method is reasonably generalized and robust from the inductive perspective. Finally, we show that our proposed method can be adopted as a near drop-in replacement for improving the accuracy of large models with a small computational budget.

READ FULL TEXT

page 3

page 8

research
09/27/2021

Multiplicative Position-aware Transformer Models for Language Understanding

Transformer models, which leverage architectural improvements like self-...
research
06/06/2021

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Without positional information, attention-based transformer neural netwo...
research
09/13/2021

SHAPE: Shifted Absolute Position Embedding for Transformers

Position representation is crucial for building position-aware represent...
research
03/13/2020

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for no...
research
10/23/2022

The Curious Case of Absolute Position Embeddings

Transformer language models encode the notion of word order using positi...
research
08/27/2021

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Since the introduction of the transformer model by Vaswani et al. (2017)...
research
12/31/2020

Shortformer: Better Language Modeling using Shorter Inputs

We explore the benefits of decreasing the input length of transformers. ...

Please sign up or login with your details

Forgot password? Click here to reset