An Image Patch is a Wave: Phase-Aware Vision MLP

11/24/2021
by   Yehui Tang, et al.
0

Different from traditional convolutional neural network (CNN) and vision transformer, the multilayer perceptron (MLP) is a new kind of vision model with extremely simple architecture that only stacked by fully-connected layers. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation.

READ FULL TEXT
research
04/19/2022

Multimodal Token Fusion for Vision Transformers

Many adaptations of transformers have emerged to address the single-moda...
research
04/19/2022

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Vision transformers have achieved great successes in many computer visio...
research
06/03/2023

Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

This paper introduces Content-aware Token Sharing (CTS), a token reducti...
research
04/02/2021

AAformer: Auto-Aligned Transformer for Person Re-Identification

Transformer is showing its superiority over convolutional architectures ...
research
12/06/2022

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

Vision transformer has demonstrated great potential in abundant vision t...
research
09/13/2023

Dynamic Spectrum Mixer for Visual Recognition

Recently, MLP-based vision backbones have achieved promising performance...
research
03/31/2023

Rethinking Local Perception in Lightweight Vision Transformer

Vision Transformers (ViTs) have been shown to be effective in various vi...

Please sign up or login with your details

Forgot password? Click here to reset