X-MLP: A Patch Embedding-Free MLP Architecture for Vision

07/02/2023
by   Xinyue Wang, et al.
0

Convolutional neural networks (CNNs) and vision transformers (ViT) have obtained great achievements in computer vision. Recently, the research of multi-layer perceptron (MLP) architectures for vision have been popular again. Vision MLPs are designed to be independent from convolutions and self-attention operations. However, existing vision MLP architectures always depend on convolution for patch embedding. Thus we propose X-MLP, an architecture constructed absolutely upon fully connected layers and free from patch embedding. It decouples the features extremely and utilizes MLPs to interact the information across the dimension of width, height and channel independently and alternately. X-MLP is tested on ten benchmark datasets, all obtaining better performance than other vision MLP models. It even surpasses CNNs by a clear margin on various dataset. Furthermore, through mathematically restoring the spatial weights, we visualize the information communication between any couples of pixels in the feature map and observe the phenomenon of capturing long-range dependency.

READ FULL TEXT
research
05/04/2021

MLP-Mixer: An all-MLP Architecture for Vision

Convolutional Neural Networks (CNNs) are the go-to model for computer vi...
research
02/26/2021

Convolution-Free Medical Image Segmentation using Transformers

Like other applications in computer vision, medical image segmentation h...
research
04/27/2023

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers

Transformers are popular neural network models that use layers of self-a...
research
06/13/2023

Reviving Shift Equivariance in Vision Transformers

Shift equivariance is a fundamental principle that governs how we percei...
research
11/24/2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

Self-attention has become an integral component of the recent network ar...
research
10/13/2022

Vision Transformers provably learn spatial structure

Vision Transformers (ViTs) have achieved comparable or superior performa...
research
11/16/2021

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

Vision transformers (ViTs) have recently demonstrated state-of-the-art p...

Please sign up or login with your details

Forgot password? Click here to reset