Localized Text-to-Image Generation for Free via Cross Attention Control

06/26/2023
by   Yutong He, et al.
0

Despite the tremendous success in text-to-image generative models, localized text-to-image generation (that is, generating objects or features at specific locations in an image while maintaining a consistent overall generation) still requires either explicit training or substantial additional inference time. In this work, we show that localized generation can be achieved by simply controlling cross attention maps during inference. With no additional training, model architecture modification or inference time, our proposed cross attention control (CAC) provides new open-vocabulary localization abilities to standard text-to-image models. CAC also enhances models that are already trained for localized generation when deployed at inference time. Furthermore, to assess localized text-to-image generation performance automatically, we develop a standardized suite of evaluations using large pretrained recognition models. Our experiments show that CAC improves localized generation performance with various types of location information ranging from bounding boxes to semantic segmentation maps, and enhances the compositional capability of state-of-the-art text-to-image generative models.

READ FULL TEXT

page 2

page 6

page 7

page 17

page 18

page 19

page 20

page 21

research
11/27/2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models ...
research
07/12/2023

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Despite the stunning ability to generate high-quality images by recent t...
research
04/04/2023

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

Token-based masked generative models are gaining popularity for their fa...
research
04/26/2023

Training-Free Location-Aware Text-to-Image Synthesis

Current large-scale generative models have impressive efficiency in gene...
research
01/04/2023

Attribute-Centric Compositional Text-to-Image Generation

Despite the recent impressive breakthroughs in text-to-image generation,...
research
06/16/2023

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models

Despite the remarkable performance of text-to-image diffusion models in ...
research
06/26/2023

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

While recent developments in text-to-image generative models have led to...

Please sign up or login with your details

Forgot password? Click here to reset