DeViT: Deformed Vision Transformers in Video Inpainting

09/28/2022
by   Jiayin Cai, et al.
13

This paper proposes a novel video inpainting method. We make three main contributions: First, we extended previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH), which improves patch-level feature alignments without additional supervision and benefits challenging scenes with various deformation. Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching by pruning out less essential features and using saliency map. MPPA enhances matching accuracy between warped tokens with invalid pixels. Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens under the guidance of the Deformation Factor learned from DePtH, especially for videos with agile motions. Experimental results demonstrate that our method outperforms recent methods qualitatively and quantitatively and achieves a new state-of-the-art.

READ FULL TEXT

page 1

page 3

page 6

page 8

page 11

research
01/24/2023

Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting

Transformers have been widely used for video processing owing to the mul...
research
08/14/2022

Flow-Guided Transformer for Video Inpainting

We propose a flow-guided transformer, which innovatively leverage the mo...
research
10/15/2022

Linear Video Transformer with Feature Fixation

Vision Transformers have achieved impressive performance in video classi...
research
06/04/2022

Video-based Human-Object Interaction Detection from Tubelet Tokens

We present a novel vision Transformer, named TUTOR, which is able to lea...
research
07/18/2022

Rethinking Alignment in Video Super-Resolution Transformers

The alignment of adjacent frames is considered an essential operation in...
research
03/17/2023

Dual-path Adaptation from Image to Video Transformers

In this paper, we efficiently transfer the surpassing representation pow...
research
09/07/2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Transformer, as a strong and flexible architecture for modelling long-ra...

Please sign up or login with your details

Forgot password? Click here to reset