Multimodal Data Augmentation for Image Captioning using Diffusion Models

05/03/2023
by   Changrong Xiao, et al.
0

Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs. Extensive experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods, and particularly a significant boost when having fewer training instances. In addition, models trained on our augmented datasets also outperform prior unpaired image captioning methods by a large margin. Finally, further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data based on quality assessment.

READ FULL TEXT

page 6

page 12

page 16

research
05/05/2023

Data Curation for Image Captioning with Text-to-Image Generative Models

Recent advances in image captioning are mainly driven by large-scale vis...
research
06/10/2021

Data augmentation to improve robustness of image captioning solutions

In this paper, we study the impact of motion blur, a common quality flaw...
research
01/30/2023

PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

Many deep learning tasks require annotations that are too time consuming...
research
07/19/2023

Improving Multimodal Datasets with Image Captioning

Massive web datasets play a key role in the success of large vision-lang...
research
04/28/2023

Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

Automated image captioning has the potential to be a useful tool for peo...
research
05/29/2023

Text-Only Image Captioning with Multi-Context Data Generation

Text-only Image Captioning (TIC) is an approach that aims to construct a...
research
02/28/2022

Interactive Machine Learning for Image Captioning

We propose an approach for interactive learning for an image captioning ...

Please sign up or login with your details

Forgot password? Click here to reset