SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

06/01/2023
by   Yanyu Li, et al.
0

Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run. As a result, high-end GPUs and cloud-based inference are required to run diffusion models at scale. This is costly and has privacy implications, especially when user data is sent to a third party. To overcome these challenges, we present a generic approach that, for the first time, unlocks running text-to-image diffusion models on mobile devices in less than 2 seconds. We achieve so by introducing efficient network architecture and improving step distillation. Specifically, we propose an efficient UNet by identifying the redundancy of the original model and reducing the computation of the image decoder via data distillation. Further, we enhance the step distillation by exploring training strategies and introducing regularization from classifier-free guidance. Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps. Our work democratizes content creation by bringing powerful text-to-image diffusion models to the hands of users.

READ FULL TEXT

page 1

page 8

page 14

page 17

research
06/08/2023

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

Diffusion models have demonstrated excellent potential for generating di...
research
07/03/2023

Squeezing Large-Scale Diffusion Models for Mobile

The emergence of diffusion models has greatly broadened the scope of hig...
research
04/21/2023

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

The rapid development and application of foundation models have revoluti...
research
06/09/2023

Boosting GUI Prototyping with Diffusion Models

GUI (graphical user interface) prototyping is a widely-used technique in...
research
05/25/2023

On Architectural Compression of Text-to-Image Diffusion Models

Exceptional text-to-image (T2I) generation results of Stable Diffusion m...
research
09/19/2023

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Diffusion models power a vast majority of text-to-audio (TTA) generation...
research
03/07/2023

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Denoising Diffusion models have demonstrated their proficiency for gener...

Please sign up or login with your details

Forgot password? Click here to reset