Multiscale Attention via Wavelet Neural Operators for Vision Transformers

03/22/2023
by   Anahita Nekoozadeh, et al.
0

Transformers have achieved widespread success in computer vision. At their heart, there is a Self-Attention (SA) mechanism, an inductive bias that associates each token in the input with every other token through a weighted basis. The standard SA mechanism has quadratic complexity with the sequence length, which impedes its utility to long sequences appearing in high resolution vision. Recently, inspired by operator learning for PDEs, Adaptive Fourier Neural Operators (AFNO) were introduced for high resolution attention based on global convolution that is efficiently implemented via FFT. However, the AFNO global filtering cannot well represent small and moderate scale structures that commonly appear in natural images. To leverage the coarse-to-fine scale structures we introduce a Multiscale Wavelet Attention (MWA) by leveraging wavelet neural operators which incurs linear complexity in the sequence size. We replace the attention in ViT with MWA and our experiments with CIFAR and ImageNet classification demonstrate significant improvement over alternative Fourier-based attentions such as AFNO and Global Filter Network (GFN).

READ FULL TEXT
research
07/05/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Transformers have achieved success in both language and vision domains. ...
research
06/21/2022

Vicinity Vision Transformer

Vision transformers have shown great success on numerous computer vision...
research
06/17/2021

XCiT: Cross-Covariance Image Transformers

Following their success in natural language processing, transformers hav...
research
05/17/2022

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Vision transformers using self-attention or its proposed alternatives ha...
research
11/29/2022

Lightweight Structure-Aware Attention for Visual Understanding

Vision Transformers (ViTs) have become a dominant paradigm for visual re...
research
07/26/2023

Adaptive Frequency Filters As Efficient Global Token Mixers

Recent vision transformers, large-kernel CNNs and MLPs have attained rem...
research
06/01/2022

Residual Multiplicative Filter Networks for Multiscale Reconstruction

Coordinate networks like Multiplicative Filter Networks (MFNs) and BACON...

Please sign up or login with your details

Forgot password? Click here to reset