VRT: A Video Restoration Transformer

by   Jingyun Liang, et al.

Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on three tasks, including video super-resolution, video deblurring and video denoising, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on nine benchmark datasets.


page 3

page 4

page 6

page 7


Recurrent Video Restoration Transformer with Guided Deformable Attention

Video restoration aims at restoring multiple high-quality frames from mu...

No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers

Video restoration, aiming at restoring clear frames from degraded videos...

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Video deblurring methods, aiming at recovering consecutive sharp frames ...

Revisiting Temporal Alignment for Video Restoration

Long-range temporal alignment is critical yet challenging for video rest...

Temporal Feature Warping for Video Shadow Detection

While single image shadow detection has been improving rapidly in recent...

Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-ahead Forward Ones

While significant progress has been made in deep video denoising, it rem...

Temporal Consistency Learning of inter-frames for Video Super-Resolution

Video super-resolution (VSR) is a task that aims to reconstruct high-res...

Please sign up or login with your details

Forgot password? Click here to reset