Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

04/04/2023
by   Jaewoong Lee, et al.
2

Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50 modifying the original generative model.

READ FULL TEXT

page 2

page 6

page 7

page 11

page 12

page 13

page 14

page 15

research
09/09/2022

Improved Masked Image Generation with Token-Critic

Non-autoregressive generative transformers recently demonstrated impress...
research
06/26/2023

Localized Text-to-Image Generation for Free via Cross Attention Control

Despite the tremendous success in text-to-image generative models, local...
research
09/07/2023

T2IW: Joint Text to Image Watermark Generation

Recent developments in text-conditioned image generative models have rev...
research
01/16/2023

Masked Vector Quantization

Generative models with discrete latent representations have recently dem...
research
09/14/2023

Masked Generative Modeling with Enhanced Sampling Scheme

This paper presents a novel sampling scheme for masked non-autoregressiv...
research
04/14/2023

M2T: Masking Transformers Twice for Faster Decoding

We show how bidirectional transformers trained for masked token predicti...
research
09/20/2023

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge

Text-conditioned image generation models have recently achieved astonish...

Please sign up or login with your details

Forgot password? Click here to reset