Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for nongeometric data, as well as novel RPEs operating on the sequences of tokens embedded in higher-dimensional Euclidean spaces (e.g. point clouds). FLTs construct the optimal RPE mechanism implicitly by learning its spectral representation. As opposed to other architectures combining efficient low-rank linear attention with RPEs, FLTs remain practical in terms of their memory usage and do not require additional assumptions about the structure of the RPE-mask. FLTs allow also for applying certain structural inductive bias techniques to specify masking strategies, e.g. they provide a way to learn the so-called local RPEs introduced in this paper and providing accuracy gains as compared with several other linear Transformers for language modeling. We also thoroughly tested FLTs on other data modalities and tasks, such as: image classification and 3D molecular modeling. For 3D-data FLTs are, to the best of our knowledge, the first Transformers architectures providing RPE-enhanced linear attention.

READ FULL TEXT
research
06/23/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

The attention module, which is a crucial component in Transformer, canno...
research
07/16/2021

Graph Kernel Attention Transformers

We introduce a new class of graph neural networks (GNNs), by combining s...
research
07/05/2021

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for...
research
02/28/2023

Applying Plain Transformers to Real-World Point Clouds

Due to the lack of inductive bias, transformer-based models usually requ...
research
07/18/2023

Linearized Relative Positional Encoding

Relative positional encoding is widely used in vanilla and linear transf...
research
06/01/2022

Transformer with Fourier Integral Attentions

Multi-head attention empowers the recent success of transformers, the st...
research
07/06/2022

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Mathematical reasoning is one of the most impressive achievements of hum...

Please sign up or login with your details

Forgot password? Click here to reset