Distinctive Image Captioning via CLIP Guided Group Optimization

08/08/2022
by   Youyuan Zhang, et al.
7

Image captioning models are usually trained according to human annotated ground-truth captions, which could generate accurate but generic captions. In this paper, we focus on generating the distinctive captions that can distinguish the target image from other similar images. To evaluate the distinctiveness of captions, we introduce a series of metrics that use large-scale vision-language pre-training model CLIP to quantify the distinctiveness. To further improve the distinctiveness of captioning models, we propose a simple and effective training strategy which trains the model by comparing target image with similar image group and optimizing the group embedding gap. Extensive experiments are conducted on various baseline models to demonstrate the wide applicability of our strategy and the consistency of metric results with human evaluation. By comparing the performance of our best model with existing state-of-the-art models, we claim that our model achieves new state-of-the-art towards distinctiveness objective.

READ FULL TEXT
research
08/20/2021

Group-based Distinctive Image Captioning with Memory Attention

Describing images using natural language is widely known as image captio...
research
04/08/2022

On Distinctive Image Captioning via Comparing and Reweighting

Recent image captioning models are achieving impressive results based on...
research
07/22/2022

Rethinking the Reference-based Distinctive Image Captioning

Distinctive Image Captioning (DIC) – generating distinctive captions tha...
research
09/08/2020

Towards Unique and Informative Captioning of Images

Despite considerable progress, state of the art image captioning models ...
research
04/27/2022

CapOnImage: Context-driven Dense-Captioning on Image

Existing image captioning systems are dedicated to generating narrative ...
research
12/24/2020

WEmbSim: A Simple yet Effective Metric for Image Captioning

The area of automatic image caption evaluation is still undergoing inten...
research
06/15/2023

Pragmatic Inference with a CLIP Listener for Contrastive Captioning

We propose a simple yet effective and robust method for contrastive capt...

Please sign up or login with your details

Forgot password? Click here to reset