Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

09/10/2023
by   Guisheng Liu, et al.
0

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.

READ FULL TEXT
research
10/10/2022

CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning

Image captioning task has been extensively researched by previous work. ...
research
10/06/2017

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved subst...
research
12/04/2022

Controllable Image Captioning via Prompting

Despite the remarkable progress of image captioning, existing captioners...
research
01/30/2023

PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

Many deep learning tasks require annotations that are too time consuming...
research
09/30/2022

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

Recent advances in image captioning have focused on scaling the data and...
research
05/08/2023

IIITD-20K: Dense captioning for Text-Image ReID

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent ...
research
09/13/2023

Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement

While diffusion models demonstrate a remarkable capability for generatin...

Please sign up or login with your details

Forgot password? Click here to reset