CIDEr: Consensus-based Image Description Evaluation

11/20/2014
by   Ramakrishna Vedantam, et al.
0

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric (CIDEr) that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.

READ FULL TEXT

page 2

page 3

page 14

page 15

page 16

research
07/22/2019

VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions

We address the task of evaluating image description generation systems. ...
research
11/17/2014

Show and Tell: A Neural Image Caption Generator

Automatically describing the content of an image is a fundamental proble...
research
11/10/2015

From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge

In this paper we propose the construction of linguistic descriptions of ...
research
09/21/2016

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Automatically describing the content of an image is a fundamental proble...
research
05/24/2022

Face2Text revisited: Improved data set and baseline results

Current image description generation models do not transfer well to the ...
research
06/05/2018

Mining for meaning: from vision to language through multiple networks consensus

Describing visual data into natural language is a very challenging task,...
research
12/29/2014

Simple Image Description Generator via a Linear Phrase-Based Approach

Generating a novel textual description of an image is an interesting pro...

Please sign up or login with your details

Forgot password? Click here to reset