Divide Bind Your Attention for Improved Generative Semantic Nursing

07/20/2023
by   Yumeng Li, et al.
0

Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., “a cat and a dog”. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page <https://sites.google.com/view/divide-and-bind>.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 9

page 12

page 13

page 14

research
05/30/2023

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

The recent advancements in image-text diffusion models have stimulated r...
research
01/31/2023

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Recent text-to-image generative models have demonstrated an unparalleled...
research
12/09/2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on t...
research
08/20/2023

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

Despite significant progress in Text-to-Image (T2I) generative models, e...
research
11/27/2022

Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs

Classifiers and generators have long been separated. We break down this ...
research
06/15/2023

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

Text-conditioned image generation models often generate incorrect associ...
research
05/29/2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Public large-scale text-to-image diffusion models, such as Stable Diffus...

Please sign up or login with your details

Forgot password? Click here to reset