Training-Free Layout Control with Cross-Attention Guidance

04/06/2023
by   Minghao Chen, et al.
0

Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.

READ FULL TEXT

page 6

page 7

page 15

page 16

page 17

page 18

page 19

page 20

research
08/11/2023

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation

Text-to-image synthesis has achieved high-quality results with recent ad...
research
06/22/2023

Continuous Layout Editing of Single Images with Diffusion Models

Recent advancements in large-scale text-to-image diffusion models have e...
research
08/24/2023

Dense Text-to-Image Generation with Attention Modulation

Existing text-to-image diffusion models struggle to synthesize realistic...
research
08/20/2023

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

Despite significant progress in Text-to-Image (T2I) generative models, e...
research
07/13/2021

Learning Aesthetic Layouts via Visual Guidance

We explore computational approaches for visual guidance to aid in creati...
research
03/24/2023

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

Recent research endeavors have shown that combining neural radiance fiel...
research
08/25/2017

Chisio: A Compound Graph Editing and Layout Framework

We introduce a new free, open-source compound graph editing and layout f...

Please sign up or login with your details

Forgot password? Click here to reset