High-Fidelity Pluralistic Image Completion with Transformers

by   Ziyu Wan, et al.

Image completion has made tremendous progress with convolutional neural networks (CNNs), because of their powerful texture modeling capacity. However, due to some inherent properties (e.g., local inductive prior, spatial-invariant kernels), CNNs do not perform well in understanding global structures or naturally support pluralistic completion. Recently, transformers demonstrate their power in modeling the long-term relationship and generating diverse results, but their computation complexity is quadratic to input length, thus hampering the application in processing high-resolution images. This paper brings the best of both worlds to pluralistic image completion: appearance prior reconstruction with transformer and texture replenishment with CNN. The former transformer recovers pluralistic coherent structures together with some coarse textures, while the latter CNN enhances the local texture details of coarse priors guided by the high-resolution masked images. The proposed method vastly outperforms state-of-the-art methods in terms of three aspects: 1) large performance boost on image fidelity even compared to deterministic completion methods; 2) better diversity and higher fidelity for pluralistic completion; 3) exceptional generalization ability on large masks and generic dataset, like ImageNet.


page 1

page 3

page 5

page 6

page 8


Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Image inpainting is an underdetermined inverse problem, it naturally all...

Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation

We address the problem of generating a 360-degree image from a single im...

CompletionFormer: Depth Completion with Convolutions and Vision Transformers

Given sparse depths and the corresponding RGB images, depth completion a...

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transforme...

3D Human Texture Estimation from a Single Image with Transformers

We propose a Transformer-based framework for 3D human texture estimation...

Spectrally Consistent UNet for High Fidelity Image Transformations

Convolutional Neural Networks (CNNs) are the current de-facto approach u...

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

Pluralistic image completion focuses on generating both visually realist...

Code Repositories


High-Fidelity Pluralistic Image Completion with Transformers

view repo

Please sign up or login with your details

Forgot password? Click here to reset