Discriminative Diffusion Models as Few-shot Vision and Language Learners

05/18/2023
by   Xuehai He, et al.
0

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Are Diffusion Models Vision-And-Language Reasoners?

Text-conditioned image generation models have recently shown immense qua...
research
06/01/2023

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Personalized text-to-image generation using diffusion models has recentl...
research
05/31/2023

Fine-grained Text Style Transfer with Diffusion-Based Language Models

Diffusion probabilistic models have shown great success in generating hi...
research
05/22/2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

In recent years, image generation has shown a great leap in performance,...
research
02/09/2023

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation

Text-to-image generation models represent the next step of evolution in ...
research
06/09/2022

The Case for a Single Model that can Both Generate Continuations and Fill in the Blank

The task of inserting text into a specified position in a passage, known...
research
11/03/2022

Evaluating a Synthetic Image Dataset Generated with Stable Diffusion

We generate synthetic images with the "Stable Diffusion" image generatio...

Please sign up or login with your details

Forgot password? Click here to reset