MaskGIT: Masked Generative Image Transformer

02/08/2022
by   Huiwen Chang, et al.
10

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation. Our experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x. Besides, we illustrate that MaskGIT can be easily extended to various image editing tasks, such as inpainting, extrapolation, and image manipulation.

READ FULL TEXT

page 14

page 15

page 16

page 17

page 18

page 21

page 22

page 23

research
03/07/2023

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

Generative transformers have shown their superiority in synthesizing hig...
research
03/29/2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Recent studies have shown the importance of modeling long-range interact...
research
03/20/2023

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Autoregressive transformers have shown remarkable success in video gener...
research
10/03/2022

Visual Prompt Tuning for Generative Transfer Learning

Transferring knowledge from an image synthesis model trained on a large ...
research
11/05/2021

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

We present a new perspective of achieving image synthesis by viewing thi...
research
03/01/2023

StraIT: Non-autoregressive Generation with Stratified Image Transformer

We propose Stratified Image Transformer(StraIT), a pure non-autoregressi...
research
11/30/2021

EdiBERT, a generative model for image editing

Advances in computer vision are pushing the limits of im-age manipulatio...

Please sign up or login with your details

Forgot password? Click here to reset