Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

08/31/2023
by   Qingping Zheng, et al.
0

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks demonstrate that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2x compared to the traditional tiled algorithm.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 10

page 11

research
02/05/2023

Mixture of Diffusers for scene composition and high resolution image generation

Diffusion methods have been proven to be very effective to generate imag...
research
09/29/2022

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diff...
research
03/21/2023

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

We tackle the task of text-to-3D creation with pre-trained latent-based ...
research
07/04/2023

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

We present SDXL, a latent diffusion model for text-to-image synthesis. C...
research
01/26/2023

On the Importance of Noise Scheduling for Diffusion Models

We empirically study the effect of noise scheduling strategies for denoi...
research
06/14/2023

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Diffusion models (DMs) have recently gained attention with state-of-the-...
research
04/11/2023

Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond

Although text-to-image diffusion models have made significant strides in...

Please sign up or login with your details

Forgot password? Click here to reset