SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

07/04/2023
by   Dustin Podell, et al.
0

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

READ FULL TEXT

page 1

page 5

page 6

page 8

page 12

page 15

page 16

page 17

research
12/20/2021

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application...
research
09/04/2023

Relay Diffusion: Unifying diffusion process across resolutions for image synthesis

Diffusion models achieved great success in image synthesis, but still fa...
research
06/14/2023

On the Robustness of Latent Diffusion Models

Latent diffusion models achieve state-of-the-art performance on a variet...
research
07/26/2022

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

Novel architectures have recently improved generative image synthesis le...
research
08/31/2023

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Stable diffusion, a generative model used in text-to-image synthesis, fr...
research
06/01/2023

Wuerstchen: Efficient Pretraining of Text-to-Image Models

We introduce Wuerstchen, a novel technique for text-to-image synthesis t...
research
06/29/2023

Filtered-Guided Diffusion: Fast Filter Guidance for Black-Box Diffusion Models

Recent advances in diffusion-based generative models have shown incredib...

Please sign up or login with your details

Forgot password? Click here to reset