SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation

08/22/2023
by   Guhnoo Yun, et al.
0

Recent studies show that self-attentions behave like low-pass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. We verify this by decomposing visual features into the frequency domain and combining them in a balanced manner. To handle this, we replace the balancing problem with a mask filtering problem in the frequency domain. Then, we introduce a novel token-mixer named SPAM and leverage it to derive a MetaFormer model termed as SPANet. Experimental results show that the proposed method provides a way to achieve this balance, and the balanced representations of both high- and low-frequency components can improve the performance of models on multiple computer vision tasks. Our code is available at $\href{https://doranlyong.github.io/projects/spanet/}{\text{https://doranlyong.github.io/projects/spanet/}}$.

READ FULL TEXT

page 1

page 5

page 14

page 15

page 16

research
11/22/2021

MetaFormer is Actually What You Need for Vision

Transformers have shown great potential in computer vision tasks. A comm...
research
07/26/2023

Adaptive Frequency Filters As Efficient Global Token Mixers

Recent vision transformers, large-kernel CNNs and MLPs have attained rem...
research
03/09/2022

Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice

Vision Transformer (ViT) has recently demonstrated promise in computer v...
research
08/31/2022

Transfering Low-Frequency Features for Domain Adaptation

Previous unsupervised domain adaptation methods did not handle the cross...
research
06/15/2022

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

We present Masked Frequency Modeling (MFM), a unified frequency-domain-b...
research
06/23/2019

Parzen Filters for Spectral Decomposition of Signals

We propose a novel family of band-pass filters for efficient spectral de...
research
06/08/2023

Multi-Architecture Multi-Expert Diffusion Models

Diffusion models have achieved impressive results in generating diverse ...

Please sign up or login with your details

Forgot password? Click here to reset