MLP-Mixer: An all-MLP Architecture for Vision

05/04/2021
by   Ilya Tolstikhin, et al.
18

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

READ FULL TEXT

page 9

page 15

03/26/2021

Understanding Robustness of Transformers for Image Classification

Deep Convolutional Neural Networks (CNNs) have long been the architectur...
06/13/2022

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing

Convolutional Neural Networks (CNNs) have been regarded as the go-to mod...
05/04/2022

Sequencer: Deep LSTM for Image Classification

In recent computer vision research, the advent of the Vision Transformer...
10/22/2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for ...
03/30/2022

Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis

The extension of convolutional neural networks (CNNs) to non-Euclidean g...
05/07/2021

ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perce...
01/19/2020

SlideImages: A Dataset for Educational Image Classification

In the past few years, convolutional neural networks (CNNs) have achieve...

Code Repositories

mlp-mixer-pytorch

An All-MLP solution for Vision, from Google AI


view repo

do-you-even-need-attention

Exploring whether attention is necessary for vision transformers


view repo

MLP-Mixer-pytorch

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision


view repo

MLP-Mixer-CIFAR10

Implements MLP-Mixer (https://arxiv.org/abs/2105.01601) with the CIFAR-10 dataset.


view repo

mlp-mixer-tf

Unofficial Implementation of MLP-Mixer in TensorFlow


view repo