Quality Estimation for Image Captions Based on Large-scale Human Evaluations

09/08/2019
by   Tomer Levinboim, et al.
0

Automatic image captioning has improved significantly in the last few years, but the problem is far from being solved. Furthermore, while the standard automatic metrics, such as CIDEr and SPICE cider,spice, can be used for model selection, they cannot be used at inference-time given a previously unseen image since they require ground-truth references. In this paper, we focus on the related problem called Quality Estimation (QE) of image-captions. In contrast to automatic metrics, QE attempts to model caption quality without relying on ground-truth references. It can thus be applied as a second-pass model (after caption generation) to estimate the quality of captions even for previously unseen images. We conduct a large-scale human evaluation experiment, in which we collect a new dataset of more than 600k ratings of image-caption pairs. Using this dataset, we design and experiment with several QE modeling approaches and provide an analysis of their performance. Our results show that QE is feasible for image captioning.

READ FULL TEXT

page 4

page 5

research
09/05/2019

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Popular metrics used for evaluating image captioning systems, such as BL...
research
04/04/2023

Cross-Domain Image Captioning with Discriminative Finetuning

Neural captioners are typically trained to mimic human-generated referen...
research
11/21/2019

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Human ratings are currently the most accurate way to assess the quality ...
research
05/25/2022

Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset

Research in massively multilingual image captioning has been severely ha...
research
10/29/2017

Evaluation of Automatic Video Captioning Using Direct Assessment

We present Direct Assessment, a method for manually assessing the qualit...
research
08/29/2019

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...
research
07/20/2023

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

Captions are crucial for understanding scientific visualizations and doc...

Please sign up or login with your details

Forgot password? Click here to reset