Masked Autoencoders as Image Processors

03/30/2023
by   Huiyu Duan, et al.
0

Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

research
12/19/2021

On Efficient Transformer and Image Pre-training for Low-level Vision

Pre-training has marked numerous state of the arts in high-level compute...
research
01/09/2022

MAXIM: Multi-Axis MLP for Image Processing

Recent progress on Transformers and multi-layer perceptron (MLP) models ...
research
04/08/2022

Vision Transformers for Single Image Dehazing

Image dehazing is a representative low-level vision task that estimates ...
research
02/08/2022

How to Understand Masked Autoencoders

"Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes ...
research
11/13/2022

Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application

Self-attention mechanisms, especially multi-head self-attention (MSA), h...
research
12/17/2021

SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers

We propose a novel zero-shot multi-frame image restoration method for re...
research
04/20/2020

LSM: Learning Subspace Minimization for Low-level Vision

We study the energy minimization problem in low-level vision tasks from ...

Please sign up or login with your details

Forgot password? Click here to reset