Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

06/14/2023
by   Zhiyu Jin, et al.
0

Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitive presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitive presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, which enables the model to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.

READ FULL TEXT

page 2

page 8

page 9

page 17

page 18

page 19

page 20

page 21

research
10/05/2022

clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP

We introduce a new method to efficiently create text-to-image models fro...
research
07/20/2023

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Recent text-to-image diffusion models have demonstrated an astonishing c...
research
01/23/2023

Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans

Object recognition systems are usually trained and evaluated on high res...
research
06/08/2023

Grounded Text-to-Image Synthesis with Attention Refocusing

Driven by scalable diffusion models trained on large-scale paired text-i...
research
08/31/2023

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Stable diffusion, a generative model used in text-to-image synthesis, fr...
research
01/28/2023

Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

It has been shown that accurate representation in media improves the wel...
research
02/18/2016

The Interaction of Memory and Attention in Novel Word Generalization: A Computational Investigation

People exhibit a tendency to generalize a novel noun to the basic-level ...

Please sign up or login with your details

Forgot password? Click here to reset