AS-MLP: An Axial Shifted MLP Architecture for Vision

07/18/2021
by   Dongze Lian, et al.
0

An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for the information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features communication. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, just like designing those of convolution kernels. With the proposed AS-MLP architecture, our model obtains 83.3 ImageNet-1K dataset. Such a simple yet effective architecture outperforms all MLP-based architectures and achieves competitive performance compared to the transformer-based architectures (e.g., Swin Transformer) even with slightly lower FLOPs. In addition, AS-MLP is also the first MLP-based architecture to be applied to the downstream tasks (e.g., object detection and semantic segmentation). The experimental results are also impressive. Our proposed AS-MLP obtains 51.5 mAP on the COCO validation set and 49.5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures. Code is available at https://github.com/svip-lab/AS-MLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Global Context Vision Transformers

We propose global context vision transformer (GC ViT), a novel architect...
research
02/14/2022

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

Token-mixing multi-layer perceptron (MLP) models have shown competitive ...
research
07/21/2021

CycleMLP: A MLP-like Architecture for Dense Prediction

This paper presents a simple MLP-like architecture, CycleMLP, which is a...
research
05/18/2020

U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

In this paper, we design a simple yet powerful deep network architecture...
research
04/15/2022

ResT V2: Simpler, Faster and Stronger

This paper proposes ResTv2, a simpler, faster, and stronger multi-scale ...
research
07/21/2022

SplitMixer: Fat Trimmed From MLP-like Models

We present SplitMixer, a simple and lightweight isotropic MLP-like archi...
research
08/05/2021

Unifying Global-Local Representations in Salient Object Detection with Transformer

The fully convolutional network (FCN) has dominated salient object detec...

Please sign up or login with your details

Forgot password? Click here to reset