Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models

06/26/2023
by   Luozhou Wang, et al.
0

Text-to-image diffusion models have advanced towards more controllable generation via supporting various image conditions (e.g., depth map) beyond text. However, these models are learned based on the premise of perfect alignment between the text and image conditions. If this alignment is not satisfied, the final output could be either dominated by one condition, or ambiguity may arise, failing to meet user expectations. To address this issue, we present a training-free approach called "Decompose and Realign” to further improve the controllability of existing models when provided with partially aligned conditions. The “Decompose” phase separates conditions based on pair relationships, computing scores individually for each pair. This ensures that each pair no longer has conflicting conditions. The "Realign” phase aligns these independently calculated scores via a cross-attention mechanism to avoid new conflicts when combing them back. Both qualitative and quantitative results demonstrate the effectiveness of our approach in handling unaligned conditions, which performs favorably against recent methods and more importantly adds flexibility to the controllable image generation process.

READ FULL TEXT

page 8

page 16

page 17

page 19

page 20

page 21

page 22

page 23

research
09/15/2023

Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models

Image cartoonization has attracted significant interest in the field of ...
research
11/11/2022

HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation

Text-driven person image generation is an emerging and challenging task ...
research
05/19/2023

Late-Constraint Diffusion Guidance for Controllable Image Synthesis

Diffusion models, either with or without text condition, have demonstrat...
research
06/19/2023

Conditional Text Image Generation with Diffusion Models

Current text recognition systems, including those for handwritten script...
research
04/15/2022

Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

Though deep generative models have gained a lot of attention, most of th...
research
02/20/2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Recent large-scale generative models learned on big data are capable of ...

Please sign up or login with your details

Forgot password? Click here to reset