Scalable Diffusion Models with Transformers

12/19/2022
by   William Peebles, et al.
0

We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops – through increased transformer depth/width or increased number of input tokens – consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.

READ FULL TEXT

page 7

page 15

page 16

page 17

page 18

page 19

page 20

page 24

research
12/27/2022

Exploring Transformer Backbones for Image Diffusion Models

We present an end-to-end Transformer based Latent Diffusion model for im...
research
06/15/2023

Fast Training of Diffusion Models with Masked Transformers

We propose an efficient approach to train large diffusion models with ma...
research
01/23/2023

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

Real-world data generation often involves complex inter-dependencies amo...
research
07/04/2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerf...
research
03/25/2023

Masked Diffusion Transformer is a Strong Image Synthesizer

Despite its success in image synthesis, we observe that diffusion probab...
research
09/25/2022

All are Worth Words: a ViT Backbone for Score-based Diffusion Models

Vision transformers (ViT) have shown promise in various vision tasks inc...
research
03/03/2023

An investigation into the adaptability of a diffusion-based TTS model

Given the recent success of diffusion in producing natural-sounding synt...

Please sign up or login with your details

Forgot password? Click here to reset