DeepAI AI Chat
Log In Sign Up

Translational Equivariance in Kernelizable Attention

by   Max Horn, et al.

While Transformer architectures have show remarkable success, they are bound to the computation of all pairwise interactions of input element and thus suffer from limited scalability. Recent work has been successful by avoiding the computation of the complete attention matrix, yet leads to problems down the line. The absence of an explicit attention matrix makes the inclusion of inductive biases relying on relative interactions between elements more challenging. An extremely powerful inductive bias is translational equivariance, which has been conjectured to be responsible for much of the success of Convolutional Neural Networks on image recognition tasks. In this work we show how translational equivariance can be implemented in efficient Transformers based on kernelizable attention - Performers. Our experiments highlight that the devised approach significantly improves robustness of Performers to shifts of input images compared to their naive application. This represents an important step on the path of replacing Convolutional Neural Networks with more expressive Transformer architectures and will help to improve sample efficiency and robustness in this realm.


MaiT: Leverage Attention Masks for More Efficient Image Transformers

Though image transformers have shown competitive results with convolutio...

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

3D-related inductive biases like translational invariance and rotational...

Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers

Recently, vision transformers and MLP-based models have been developed i...

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for...

On the Bias Against Inductive Biases

Borrowing from the transformer models that revolutionized the field of n...

An Attention Free Transformer

We introduce Attention Free Transformer (AFT), an efficient variant of T...

Code Repositories


Code for the paper "Translational Equivariance in Kernelizable Attention"

view repo