Dense Text-to-Image Generation with Attention Modulation

08/24/2023
by   Yunji Kim, et al.
0

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.

READ FULL TEXT

page 7

page 8

page 13

page 14

page 15

page 16

page 17

page 18

research
02/16/2023

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

Layout-to-image generation refers to the task of synthesizing photo-real...
research
05/08/2023

IIITD-20K: Dense captioning for Text-Image ReID

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent ...
research
03/25/2023

Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a clo...
research
04/06/2023

Training-Free Layout Control with Cross-Attention Guidance

Recent diffusion-based generators can produce high-quality images based ...
research
11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...
research
06/20/2015

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Recent progress on automatic generation of image captions has shown that...
research
02/04/2023

Semantic-Guided Image Augmentation with Pre-trained Models

Image augmentation is a common mechanism to alleviate data scarcity in c...

Please sign up or login with your details

Forgot password? Click here to reset