Lightweight Structure-Aware Attention for Visual Understanding

11/29/2022
by   Heeseung Kwon, et al.
0

Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators. Although these operators provide flexibility to the model with their adjustable attention kernels, they suffer from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy of the ViT layers, and (2) the complexity in computation and memory is quadratic in the sequence length. In this paper, we propose a novel attention operator, called lightweight structure-aware attention (LiSA), which has a better representation power with log-linear complexity. Our operator learns structural patterns by using a set of relative position embeddings (RPEs). To achieve log-linear complexity, the RPEs are approximated with fast Fourier transforms. Our experiments and ablation studies demonstrate that ViTs based on the proposed operator outperform self-attention and other existing operators, achieving state-of-the-art results on ImageNet, and competitive results on other visual understanding benchmarks such as COCO and Something-Something-V2. The source code of our approach will be released online.

READ FULL TEXT
research
02/23/2022

FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks

Transformers achieve remarkable performance in various domains, includin...
research
07/05/2022

Softmax-free Linear Transformers

Vision transformers (ViTs) have pushed the state-of-the-art for various ...
research
10/22/2021

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various ...
research
03/22/2023

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

Transformers have achieved widespread success in computer vision. At the...
research
03/12/2020

Efficient Content-Based Sparse Attention with Routing Transformers

Self-attention has recently been adopted for a wide range of sequence mo...
research
12/20/2019

Axial Attention in Multidimensional Transformers

We propose Axial Transformers, a self-attention-based autoregressive mod...
research
04/06/2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

Humans possess a versatile mechanism for extracting structured represent...

Please sign up or login with your details

Forgot password? Click here to reset