On Architectural Compression of Text-to-Image Diffusion Models

05/25/2023
by   Bo-Kyeong Kim, et al.
0

Exceptional text-to-image (T2I) generation results of Stable Diffusion models (SDMs) come with substantial computational demands. To resolve this issue, recent research on efficient SDMs has prioritized reducing the number of sampling steps and utilizing network quantization. Orthogonal to these directions, this study highlights the power of classical architectural compression for general-purpose T2I synthesis by introducing block-removed knowledge-distilled SDMs (BK-SDMs). We eliminate several residual and attention blocks from the U-Net of SDMs, obtaining over a 30 parameters, MACs per sampling step, and latency. We conduct distillation-based pretraining with only 0.22M LAION pairs (fewer than 0.1 pairs) on a single A100 GPU. Despite being trained with limited resources, our compact models can imitate the original SDM by benefiting from transferred knowledge and achieve competitive results against larger multi-billion parameter models on the zero-shot MS-COCO benchmark. Moreover, we demonstrate the applicability of our lightweight pretrained models in personalized generation with DreamBooth finetuning.

READ FULL TEXT

page 1

page 4

page 7

page 9

research
09/12/2023

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Diffusion models have revolutionized text-to-image generation with its e...
research
06/01/2023

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Text-to-image diffusion models can create stunning images from natural l...
research
05/05/2023

Data Curation for Image Captioning with Text-to-Image Generative Models

Recent advances in image captioning are mainly driven by large-scale vis...
research
03/14/2023

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Text-to-3D generation has shown rapid progress in recent days with the a...
research
02/05/2023

ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval

Diffusion models show promising generation capability for a variety of d...
research
08/31/2023

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Recent advances in neural text-to-speech (TTS) models bring thousands of...
research
04/02/2023

A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Virtual humans have gained considerable attention in numerous industries...

Please sign up or login with your details

Forgot password? Click here to reset