TFill: Image Completion via a Transformer-Based Architecture

04/02/2021
by   Chuanxia Zheng, et al.
1

Bridging distant context interactions is important for high quality image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose treating image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence in the encoder in a first phase. Crucially, we employ a restrictive CNN with small and non-overlapping RF for token representation, which allows the transformer to explicitly model the long-range context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related features and also avoid the insular effect of standard attention. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets.

READ FULL TEXT

page 12

page 13

page 14

page 15

page 16

page 17

page 18

page 19

research
10/21/2022

Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences

Efficient Transformers have been developed for long sequence modeling, d...
research
11/11/2022

Token Transformer: Can class token help window-based transformer build better long-range interactions?

Compared with the vanilla transformer, the window-based transformer offe...
research
10/11/2022

Memory transformers for full context and high-resolution 3D Medical Segmentation

Transformer models achieve state-of-the-art results for image segmentati...
research
04/25/2023

CompletionFormer: Depth Completion with Convolutions and Vision Transformers

Given sparse depths and the corresponding RGB images, depth completion a...
research
07/05/2022

Efficient Representation Learning via Adaptive Context Pooling

Self-attention mechanisms model long-range context by using pairwise att...
research
09/08/2023

CNN Injected Transformer for Image Exposure Correction

Capturing images with incorrect exposure settings fails to deliver a sat...
research
12/11/2020

Cyclic orthogonal convolutions for long-range integration of features

In Convolutional Neural Networks (CNNs) information flows across a small...

Please sign up or login with your details

Forgot password? Click here to reset