Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

03/07/2023
∙
by   Jiacheng Li, et al.
∙
0
∙

Generative transformers have shown their superiority in synthesizing high-fidelity and high-resolution images, such as good diversity and training stability. However, they suffer from the problem of slow generation since they need to generate a long token sequence autoregressively. To better accelerate the generative transformers while keeping good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model. Lformer firstly encodes an image into h×h discrete tokens, then divides these tokens into h mirrored L-shape blocks from the top left to the bottom right and decodes the tokens in a block parallelly in each step. Lformer predicts the area adjacent to the previous context like autoregressive models thus it is more stable while accelerating. By leveraging the 2D structure of image tokens, Lformer achieves faster speed than the existing transformer-based methods while keeping good generation quality. Moreover, the pretrained Lformer can edit images without the requirement for finetuning. We can roll back to the early steps for regeneration or edit the image with a bounding box and a text prompt.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

page 8

research
∙ 09/09/2022

Improved Masked Image Generation with Token-Critic

Non-autoregressive generative transformers recently demonstrated impress...
research
∙ 02/08/2022

MaskGIT: Masked Generative Image Transformer

Generative transformers have experienced rapid popularity growth in the ...
research
∙ 04/27/2023

IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers

Scalable Vector Graphics (SVG) is a popular vector image format that off...
research
∙ 07/13/2023

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Text-to-image (T2I) personalization allows users to guide the creative i...
research
∙ 10/03/2022

Visual Prompt Tuning for Generative Transfer Learning

Transferring knowledge from an image synthesis model trained on a large ...
research
∙ 10/05/2022

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Recently, vector quantized autoregressive (VQ-AR) models have shown rema...
research
∙ 06/22/2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

We present the Pathways Autoregressive Text-to-Image (Parti) model, whic...

Please sign up or login with your details

Forgot password? Click here to reset