Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model

08/16/2022
by   Xiulong Yang, et al.
14

Diffusion Denoising Probability Models (DDPM) and Vision Transformer (ViT) have demonstrated significant progress in generative tasks and discriminative tasks, respectively, and thus far these models have largely been developed in their own domains. In this paper, we establish a direct connection between DDPM and ViT by integrating the ViT architecture into DDPM, and introduce a new generative model called Generative ViT (GenViT). The modeling flexibility of ViT enables us to further extend GenViT to hybrid discriminative-generative modeling, and introduce a Hybrid ViT (HybViT). Our work is among the first to explore a single ViT for image generation and classification jointly. We conduct a series of experiments to analyze the performance of proposed models and demonstrate their superiority over prior state-of-the-arts in both generative and discriminative tasks. Our code and pre-trained models can be found in https://github.com/sndnyang/Diffusion_ViT .

READ FULL TEXT

page 6

page 7

page 16

page 17

page 20

page 21

page 22

page 23

research
08/18/2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE...
research
05/25/2023

Are Diffusion Models Vision-And-Language Reasoners?

Text-conditioned image generation models have recently shown immense qua...
research
06/01/2023

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Recent advances in vision-language pre-training have enabled machines to...
research
03/17/2023

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

Learning from a large corpus of data, pre-trained models have achieved i...
research
01/24/2023

Model soups to increase inference without increasing compute time

In this paper, we compare Model Soups performances on three different mo...
research
03/14/2023

Interpretable ODE-style Generative Diffusion Model via Force Field Construction

For a considerable time, researchers have focused on developing a method...
research
07/24/2018

A Hybrid of Deep Audio Feature and i-vector for Artist Recognition

Artist recognition is a task of modeling the artist's musical style. Thi...

Please sign up or login with your details

Forgot password? Click here to reset