Wuerstchen: Efficient Pretraining of Text-to-Image Models

06/01/2023
by   Pablo Pernias, et al.
0

We introduce Wuerstchen, a novel technique for text-to-image synthesis that unites competitive performance with unprecedented cost-effectiveness and ease of training on constrained hardware. Building on recent advancements in machine learning, our approach, which utilizes latent diffusion strategies at strong latent image compression rates, significantly reduces the computational burden, typically associated with state-of-the-art models, while preserving, if not enhancing, the quality of generated images. Wuerstchen achieves notable speed improvements at inference time, thereby rendering real-time applications more viable. One of the key advantages of our method lies in its modest training requirements of only 9,200 GPU hours, slashing the usual costs significantly without compromising the end performance. In a comparison against the state-of-the-art, we found the approach to yield strong competitiveness. This paper opens the door to a new line of research that prioritizes both performance and computational accessibility, hence democratizing the use of sophisticated AI technologies. Through Wuerstchen, we demonstrate a compelling stride forward in the realm of text-to-image synthesis, offering an innovative path to explore in future research.

READ FULL TEXT

page 2

page 8

page 11

page 14

page 15

page 16

page 17

page 18

research
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...
research
12/20/2021

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application...
research
01/23/2023

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Text-to-image synthesis has recently seen significant progress thanks to...
research
07/04/2023

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

We present SDXL, a latent diffusion model for text-to-image synthesis. C...
research
02/09/2023

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation

Text-to-image generation models represent the next step of evolution in ...
research
04/12/2023

RECLIP: Resource-efficient CLIP by Training with Small Images

We present RECLIP (Resource-efficient CLIP), a simple method that minimi...

Please sign up or login with your details

Forgot password? Click here to reset