Progressive Denoising Model for Fine-Grained Text-to-Image Generation

10/05/2022
by   Zhengcong Fei, et al.
0

Recently, vector quantized autoregressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of each component. In this paper, we present a progressive denoising model for high-fidelity text-to-image image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner and this procedure is recursively applied until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments demonstrate that the progressive model produces significantly better results when compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the text-to-image generation time of traditional AR increases linearly with the output image resolution and hence is quite time-consuming even for normal-size images. In contrast, our approach allows achieving a better trade-off between generation quality and speed.

READ FULL TEXT

page 1

page 3

page 9

research
11/29/2021

Vector Quantized Diffusion Model for Text-to-Image Synthesis

We present the vector quantized diffusion (VQ-Diffusion) model for text-...
research
03/07/2023

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

Generative transformers have shown their superiority in synthesizing hig...
research
08/19/2021

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Autoregressive models and their sequential factorization of the data lik...
research
07/20/2022

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Generating sound effects that humans want is an important topic. However...
research
03/03/2022

Autoregressive Image Generation using Residual Quantization

For autoregressive (AR) modeling of high-resolution images, vector quant...
research
05/16/2023

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models have gained significant attention in the realm of image...
research
12/01/2021

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Recently, vector-quantized image modeling has demonstrated impressive pe...

Please sign up or login with your details

Forgot password? Click here to reset