SPICE: Semantic Propositional Image Caption Evaluation

07/29/2016
by   Peter Anderson, et al.
0

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as `which caption-generator best understands colors?' and `can caption-generators count?'

READ FULL TEXT

page 2

page 6

research
09/04/2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

This paper presents a new metric called TIGEr for the automatic evaluati...
research
07/31/2020

Evaluating Automatically Generated Phoneme Captions for Images

Image2Speech is the relatively new task of generating a spoken descripti...
research
04/01/2015

Microsoft COCO Captions: Data Collection and Evaluation Server

In this paper we describe the Microsoft COCO Caption dataset and evaluat...
research
10/12/2018

Pre-gen metrics: Predicting caption quality metrics without generating captions

Image caption generation systems are typically evaluated against referen...
research
05/01/2020

KPQA: A Metric for Generative Question Answering Using Word Weights

For the automatic evaluation of Generative Question Answering (genQA) sy...
research
10/09/2022

QAScore – An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Question Generation (QG) aims to automate the task of composing question...
research
06/02/2021

SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis

The open-ended nature of visual captioning makes it a challenging area f...

Please sign up or login with your details

Forgot password? Click here to reset