DeepAI AI Chat
Log In Sign Up

Style-Aware Contrastive Learning for Multi-Style Image Captioning

by   Yucheng Zhou, et al.
University of Technology Sydney

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.


page 1

page 8


Few-shot Font Generation by Learning Style Difference and Similarity

Few-shot font generation (FFG) aims to preserve the underlying global st...

Improving the Latent Space of Image Style Transfer

Existing neural style transfer researches have studied to match statisti...

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Generating stylized captions for an image is an emerging topic in image ...

A Self-Explainable Stylish Image Captioning Framework via Multi-References

In this paper, we propose to build a stylish image captioning model thro...

Diverse Image Captioning with Grounded Style

Stylized image captioning as presented in prior work aims to generate ca...

SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization

Image harmonization aims to achieve visual consistency in composite imag...

Prototype-to-Style: Dialogue Generation with Style-Aware Editing on Retrieval Memory

The ability of a dialog system to express prespecified language style du...