DeepAI AI Chat
Log In Sign Up

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

by   Tatiana Likhomanenko, et al.

Without positional information, attention-based transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed transformer models positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences of different length than those seen at training time. Relative positions are more robust to length change, but are more complex to implement and yield inferior model throughput. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative position embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.


Improve Transformer Models with Better Relative Position Embeddings

Transformer architectures rely on explicit position encodings in order t...

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

Transformer-based end-to-end speech recognition models have received con...

The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training con...

PermuteFormer: Efficient Relative Position Encoding for Long Sequences

A recent variation of Transformer, Performer, scales Transformer to long...

Fractally-organized Connectionist Networks: Conjectures and Preliminary Results

A strict interpretation of connectionism mandates complex networks of si...

Shortformer: Better Language Modeling using Shorter Inputs

We explore the benefits of decreasing the input length of transformers. ...

Relative Positional Encoding for Speech Recognition and Direct Translation

Transformer models are powerful sequence-to-sequence architectures that ...