ActiveMLP: An MLP-like Architecture with Active Token Mixer

03/11/2022
by   Guoqiang Wei, et al.
0

This paper presents ActiveMLP, a general MLP-like backbone for computer vision. The three existing dominant network families, i.e., CNNs, Transformers and MLPs, differ from each other mainly in the ways to fuse contextual information into a given token, leaving the design of more effective token-mixing mechanisms at the core of backbone architecture development. In ActiveMLP, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given one. This fundamental operator actively predicts where to capture useful contexts and learns how to fuse the captured contexts with the original information of the given token at channel levels. In this way, the spatial range of token-mixing is expanded and the way of token-mixing is reformed. With this design, ActiveMLP is endowed with the merits of global receptive fields and more flexible content-adaptive information fusion. Extensive experiments demonstrate that ActiveMLP is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks. The code and models will be available at https://github.com/microsoft/ActiveMLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2021

Rethinking Token-Mixing MLP for MLP-based Vision Backbone

In the past decade, we have witnessed rapid progress in the machine visi...
research
07/26/2023

Adaptive Frequency Filters As Efficient Global Token Mixers

Recent vision transformers, large-kernel CNNs and MLPs have attained rem...
research
03/28/2023

TFS-ViT: Token-Level Feature Stylization for Domain Generalization

Standard deep learning models such as convolutional neural networks (CNN...
research
10/12/2022

Token-Label Alignment for Vision Transformers

Data mixing strategies (e.g., CutMix) have shown the ability to greatly ...
research
04/26/2023

UniNeXt: Exploring A Unified Architecture for Vision Recognition

Vision Transformers have shown great potential in computer vision tasks....
research
01/28/2022

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

Recently, MLP-like vision models have achieved promising performances on...
research
03/15/2023

BiFormer: Vision Transformer with Bi-Level Routing Attention

As the core building block of vision transformers, attention is a powerf...

Please sign up or login with your details

Forgot password? Click here to reset