InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

09/12/2023
by   Xingchao Liu, et al.
0

Diffusion models have revolutionized text-to-image generation with its exceptional quality and creativity. However, its multi-step sampling process is known to be slow, often requiring tens of inference steps to obtain satisfactory results. Previous attempts to improve its sampling speed and reduce computational costs through distillation have been unsuccessful in achieving a functional one-step model. In this paper, we explore a recent method called Rectified Flow, which, thus far, has only been applied to small datasets. The core of Rectified Flow lies in its reflow procedure, which straightens the trajectories of probability flows, refines the coupling between noises and images, and facilitates the distillation process with student models. We propose a novel text-conditioned pipeline to turn Stable Diffusion (SD) into an ultra-fast one-step model, in which we find reflow plays a critical role in improving the assignment between noise and images. Leveraging our new pipeline, we create, to the best of our knowledge, the first one-step diffusion-based text-to-image generator with SD-level image quality, achieving an FID (Frechet Inception Distance) of 23.3 on MS COCO 2017-5k, surpassing the previous state-of-the-art technique, progressive distillation, by a significant margin (37.2 → 23.3 in FID). By utilizing an expanded network with 1.7B parameters, we further improve the FID to 22.4. We call our one-step models InstaFlow. On MS COCO 2014-30k, InstaFlow yields an FID of 13.1 in just 0.09 second, the best in ≤ 0.1 second regime, outperforming the recent StyleGAN-T (13.9 in 0.1 second). Notably, the training of InstaFlow only costs 199 A100 GPU days. Project page: <https://github.com/gnobitab/InstaFlow>.

READ FULL TEXT

page 2

page 14

page 15

page 16

page 26

page 27

page 29

page 30

research
02/01/2022

Progressive Distillation for Fast Sampling of Diffusion Models

Diffusion models have recently shown great promise for generative modeli...
research
05/25/2023

On Architectural Compression of Text-to-Image Diffusion Models

Exceptional text-to-image (T2I) generation results of Stable Diffusion m...
research
11/22/2022

Accelerating Diffusion Sampling with Classifier-based Feature Distillation

Although diffusion model has shown great potential for generating higher...
research
11/15/2022

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

The recent advances in diffusion models have set an impressive milestone...
research
10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...
research
07/12/2023

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

Large-scale image generation models, with impressive quality made possib...
research
05/31/2023

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

Watermarking the outputs of generative models is a crucial technique for...

Please sign up or login with your details

Forgot password? Click here to reset