ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

08/02/2023
by   Kanzhi Cheng, et al.
0

Generating visually grounded image captions with specific linguistic styles using unpaired stylistic corpora is a challenging task, especially since we expect stylized captions with a wide variety of stylistic patterns. In this paper, we propose a novel framework to generate Accurate and Diverse Stylized Captions (ADS-Cap). Our ADS-Cap first uses a contrastive learning module to align the image and text features, which unifies paired factual and unpaired stylistic corpora during the training process. A conditional variational auto-encoder is then used to automatically memorize diverse stylistic patterns in latent space and enhance diversity through sampling. We also design a simple but effective recheck module to boost style accuracy by filtering style-specific captions. Experimental results on two widely used stylized image captioning datasets show that regarding consistency with the image, style accuracy and diversity, ADS-Cap achieves outstanding performances compared to various baselines. We finally conduct extensive analyses to understand the effectiveness of our method. Our code is available at https://github.com/njucckevin/ADS-Cap.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2022

Diverse Image Captioning with Grounded Style

Stylized image captioning as presented in prior work aims to generate ca...
research
07/12/2022

Learning Diverse Tone Styles for Image Retouching

Image retouching, aiming to regenerate the visually pleasing renditions ...
research
05/28/2022

Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning

Accuracy and Diversity are two essential metrizable manifestations in ge...
research
04/23/2020

Conditional Variational Image Deraining

Image deraining is an important yet challenging image processing task. T...
research
01/26/2023

Style-Aware Contrastive Learning for Multi-Style Image Captioning

Existing multi-style image captioning methods show promising results in ...
research
08/14/2019

Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

Although significant progress has been made in the field of automatic im...
research
07/31/2023

Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

Stylized visual captioning aims to generate image or video descriptions ...

Please sign up or login with your details

Forgot password? Click here to reset