CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

06/06/2021
by   Tatiana Likhomanenko, et al.
0

Without positional information, attention-based transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed transformer models positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences of different length than those seen at training time. Relative positions are more robust to length change, but are more complex to implement and yield inferior model throughput. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative position embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.

READ FULL TEXT
research
09/28/2020

Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order t...
research
07/13/2021

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

Transformer-based end-to-end speech recognition models have received con...
research
05/31/2023

The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training con...
research
09/06/2021

PermuteFormer: Efficient Relative Position Encoding for Long Sequences

A recent variation of Transformer, Performer, scales Transformer to long...
research
12/31/2020

Shortformer: Better Language Modeling using Shorter Inputs

We explore the benefits of decreasing the input length of transformers. ...
research
05/18/2015

Fractally-organized Connectionist Networks: Conjectures and Preliminary Results

A strict interpretation of connectionism mandates complex networks of si...
research
05/20/2020

Relative Positional Encoding for Speech Recognition and Direct Translation

Transformer models are powerful sequence-to-sequence architectures that ...

Please sign up or login with your details

Forgot password? Click here to reset