Describing like humans: on diversity in image captioning

03/28/2019
by   Qingzhong Wang, et al.
0

Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models --- the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.

READ FULL TEXT

page 7

page 11

page 14

page 15

page 16

page 18

page 19

page 20

research
03/26/2020

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models

Image captioning models have been able to generate grammatically correct...
research
02/27/2020

Analysis of diversity-accuracy tradeoff in image captioning

We investigate the effect of different model architectures, training obj...
research
09/08/2020

Towards Unique and Informative Captioning of Images

Despite considerable progress, state of the art image captioning models ...
research
05/28/2022

Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning

Accuracy and Diversity are two essential metrizable manifestations in ge...
research
05/31/2018

Diverse and Controllable Image Captioning with Part-of-Speech Guidance

Automatically describing an image is an important capability for virtual...
research
08/14/2019

Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

Although significant progress has been made in the field of automatic im...
research
07/14/2020

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

A wide range of image captioning models has been developed, achieving si...

Please sign up or login with your details

Forgot password? Click here to reset