Are Diffusion Models Vision-And-Language Reasoners?

05/25/2023
by   Benno Krojer, et al.
0

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.

READ FULL TEXT

page 2

page 6

page 15

page 16

page 18

research
05/18/2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible perfor...
research
06/01/2023

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Recent advances in vision-language pre-training have enabled machines to...
research
08/16/2022

Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model

Diffusion Denoising Probability Models (DDPM) and Vision Transformer (Vi...
research
05/31/2023

Fine-grained Text Style Transfer with Diffusion-Based Language Models

Diffusion probabilistic models have shown great success in generating hi...
research
10/02/2022

Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2

The field of image synthesis has made great strides in the last couple o...
research
04/10/2023

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

Denosing diffusion model, as a generative model, has received a lot of a...
research
03/31/2023

Can AI Put Gamma-Ray Astrophysicists Out of a Job?

In what will likely be a litany of generative-model-themed arXiv submiss...

Please sign up or login with your details

Forgot password? Click here to reset