Global Filter Networks for Image Classification

07/01/2021
by   Yongming Rao, et al.
0

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform. We exhibit favorable accuracy/complexity trade-offs of our models on both ImageNet and downstream tasks. Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness. Code is available at https://github.com/raoyongming/GFNet

READ FULL TEXT

page 10

page 15

research
10/25/2022

MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

Recently, Visual Transformer (ViT) has been widely used in various field...
research
03/24/2022

FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization

MLP-like models built entirely upon multi-layer perceptrons have recentl...
research
07/26/2023

Adaptive Frequency Filters As Efficient Global Token Mixers

Recent vision transformers, large-kernel CNNs and MLPs have attained rem...
research
07/21/2022

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Transformers have emerged as a preferred model for many tasks in natural...
research
06/24/2021

VOLO: Vision Outlooker for Visual Recognition

Visual recognition has been dominated by convolutional neural networks (...
research
11/22/2022

Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

We present an effective and efficient method that explores the propertie...
research
06/08/2022

UHD Image Deblurring via Multi-scale Cubic-Mixer

Currently, transformer-based algorithms are making a splash in the domai...

Please sign up or login with your details

Forgot password? Click here to reset