DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

09/13/2022
by   Dafeng Zhang, et al.
0

Recent works achieve excellent results in defocus deblurring task based on dual-pixel data using convolutional neural network (CNN), while the scarcity of data limits the exploration and attempt of vision transformer in this task. In addition, the existing works use fixed parameters and network architecture to deblur images with different distribution and content information, which also affects the generalization ability of the model. In this paper, we propose a dynamic multi-scale network, named DMTNet, for dual-pixel images defocus deblurring. DMTNet mainly contains two modules: feature extraction module and reconstruction module. The feature extraction module is composed of several vision transformer blocks, which uses its powerful feature extraction capability to obtain richer features and improve the robustness of the model. The reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Module (DMSSRM). DMSSRM can restore images by adaptively assigning weights to features from different scales according to the blur distribution and content information of the input images. DMTNet combines the advantages of transformer and CNN, in which the vision transformer improves the performance ceiling of CNN, and the inductive bias of CNN enables transformer to extract more robust features without relying on a large amount of data. DMTNet might be the first attempt to use vision transformer to restore the blurring images to clarity. By combining with CNN, the vision transformer may achieve better performance on small datasets. Experimental results on the popular benchmarks demonstrate that our DMTNet significantly outperforms state-of-the-art methods.

READ FULL TEXT

page 4

page 7

page 8

research
10/14/2022

MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images

For the task of change detection (CD) in remote sensing images, deep con...
research
11/16/2021

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Scene text recognition (STR) is an important bridge between images and t...
research
10/10/2022

LAPFormer: A Light and Accurate Polyp Segmentation Transformer

Polyp segmentation is still known as a difficult problem due to the larg...
research
04/08/2022

Multi-scale temporal network for continuous sign language recognition

Continuous Sign Language Recognition (CSLR) is a challenging research ta...
research
04/14/2023

PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition

Pedestrian attribute recognition (PAR) has received increasing attention...
research
12/06/2022

AbHE: All Attention-based Homography Estimation

Homography estimation is a basic computer vision task, which aims to obt...
research
06/15/2023

CoverHunter: Cover Song Identification with Refined Attention and Alignments

Abstract: Cover song identification (CSI) focuses on finding the same mu...

Please sign up or login with your details

Forgot password? Click here to reset