Exploring Vision Transformers as Diffusion Learners

12/28/2022
by   He Cao, et al.
0

Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.

READ FULL TEXT

page 2

page 7

page 14

page 16

page 17

page 18

page 19

page 20

research
06/15/2023

Fast Training of Diffusion Models with Masked Transformers

We propose an efficient approach to train large diffusion models with ma...
research
09/25/2022

All are Worth Words: a ViT Backbone for Score-based Diffusion Models

Vision transformers (ViT) have shown promise in various vision tasks inc...
research
01/23/2023

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

Real-world data generation often involves complex inter-dependencies amo...
research
01/31/2023

Learning Data Representations with Joint Diffusion Models

We introduce a joint diffusion model that simultaneously learns meaningf...
research
05/19/2023

DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion

Diffusion models have emerged as the new state-of-the-art family of deep...
research
08/23/2021

Regularizing Transformers With Deep Probabilistic Layers

Language models (LM) have grown with non-stop in the last decade, from s...
research
09/20/2023

FreeU: Free Lunch in Diffusion U-Net

In this paper, we uncover the untapped potential of diffusion U-Net, whi...

Please sign up or login with your details

Forgot password? Click here to reset