MAXIM: Multi-Axis MLP for Image Processing

01/09/2022
by   Zhengzhong Tu, et al.
2

Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks for using Transformers and MLPs in image restoration. In this work we present a multi-axis MLP based architecture, called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models.

READ FULL TEXT

page 14

page 15

page 17

page 18

page 19

page 20

page 21

page 28

research
03/30/2023

Masked Autoencoders as Image Processors

Transformers have shown significant effectiveness for various vision tas...
research
04/04/2022

MaxViT: Multi-Axis Vision Transformer

Transformers have recently gained significant attention in the computer ...
research
08/25/2023

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Despite their simpler information fusion designs compared with Vision Tr...
research
07/08/2022

k-means Mask Transformer

The rise of transformers in vision tasks not only advances network backb...
research
04/13/2023

Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement

Burst image processing is becoming increasingly popular in recent years....
research
05/31/2020

A General-Purpose Dehazing Algorithm based on Local Contrast Enhancement Approaches

Dehazing is in the image processing and computer vision communities, the...
research
09/25/2022

All are Worth Words: a ViT Backbone for Score-based Diffusion Models

Vision transformers (ViT) have shown promise in various vision tasks inc...

Please sign up or login with your details

Forgot password? Click here to reset