Randomized Positional Encodings Boost Length Generalization of Transformers

05/26/2023
by   Anian Ruoss, et al.
0

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply training on longer sequences is inefficient due to the quadratic computation complexity of the global attention mechanism. In this work, we demonstrate that this failure mode is linked to positional encodings being out-of-distribution for longer sequences (even for relative encodings) and introduce a novel family of positional encodings that can overcome this problem. Concretely, our randomized positional encoding scheme simulates the positions of longer sequences and randomly selects an ordered subset to fit the sequence's length. Our large-scale empirical evaluation of 6000 models across 15 algorithmic reasoning tasks shows that our method allows Transformers to generalize to sequences of unseen length (increasing test accuracy by 12.0 average).

READ FULL TEXT
research
05/31/2023

The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training con...
research
06/27/2023

Length Generalization in Arithmetic Transformers

We examine how transformers cope with two challenges: learning basic int...
research
07/11/2022

Exploring Length Generalization in Large Language Models

The ability to extrapolate from short problem instances to longer ones i...
research
07/28/2020

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most succe...
research
08/21/2023

Giraffe: Adventures in Expanding Context Lengths in LLMs

Modern large language models (LLMs) that rely on attention mechanisms ar...
research
08/30/2023

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

In recent years, there have been remarkable advancements in the performa...
research
05/31/2023

Monotonic Location Attention for Length Generalization

We explore different ways to utilize position-based cross-attention in s...

Please sign up or login with your details

Forgot password? Click here to reset