Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study on Image Restoration

01/29/2023
by   Peng Qiao, et al.
0

The great success of Deep Neural Networks (DNNs) has inspired the algorithmic development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP methods, trained by Back-Propagation Through Time or computing the inaccurate inversion of the Jacobian, suffer from inferior representation ability. Motivated by the representation power of the Transformer, we propose a framework to unroll the FP and approximate each unrolled process via Transformer blocks, called FPformer. To reduce the high consumption of memory and computation, we come up with FPRformer by sharing parameters between the successive blocks. We further design a module to adapt Anderson acceleration to FPRformer to enlarge the unrolled iterations and improve the performance, called FPAformer. In order to fully exploit the capability of the Transformer, we apply the proposed model to image restoration, using self-supervised pre-training and supervised fine-tuning. 161 tasks from 4 categories of image restoration problems are used in the pre-training phase. Hereafter, the pre-trained FPformer, FPRformer, and FPAformer are further fine-tuned for the comparison scenarios. Using self-supervised pre-training and supervised fine-tuning, the proposed FPformer, FPRformer, and FPAformer achieve competitive performance with state-of-the-art image restoration methods and better training efficiency. FPAformer employs only 29.82 SwinIR models, and provides superior performance after fine-tuning. To train these comparison models, it takes only 26.9 models. It provides a promising way to introduce the Transformer in low-level vision tasks.

READ FULL TEXT

page 1

page 14

page 15

page 16

page 17

page 18

page 19

research
06/12/2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

The use of self-supervised pre-training has emerged as a promising appro...
research
04/30/2022

StorSeismic: A new paradigm in deep learning for seismic processing

Machine learned tasks on seismic data are often trained sequentially and...
research
10/23/2022

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification

Vision Transformer (ViT) has become one of the most popular neural archi...
research
05/18/2022

Deep Features for CBIR with Scarce Data using Hebbian Learning

Features extracted from Deep Neural Networks (DNNs) have proven to be ve...
research
05/16/2023

NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer

Nighttime image dehazing is a challenging task due to the presence of mu...
research
04/25/2023

LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization

We present a simple yet effective self-supervised pre-training method fo...
research
12/27/2021

ViR:the Vision Reservoir

The most recent year has witnessed the success of applying the Vision Tr...

Please sign up or login with your details

Forgot password? Click here to reset