Object Hallucination in Image Captioning

09/06/2018
by   Anna Rohrbach, et al.
0

Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. We analyze how captioning model architectures and learning objectives contribute to object hallucination, explore when hallucination is likely due to image misclassification or language priors, and assess how well current sentence metrics capture object hallucination. We investigate these questions on the standard image caption- ing benchmark, MSCOCO, using a diverse set of models. Our analysis yields several interesting findings, including that models which score best on standard sentence metrics do not always have lower hallucination and that models which hallucinate more tend to make errors driven by language priors.

READ FULL TEXT
research
07/04/2022

Are metrics measuring what they should? An evaluation of image captioning task metrics

Image Captioning is a current research task to describe the image conten...
research
10/06/2021

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

The task of image-text matching aims to map representations from differe...
research
09/28/2022

Thinking Hallucination for Video Captioning

With the advent of rich visual representations and pre-trained language ...
research
06/29/2021

Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder

Automatically evaluating the quality of image captions can be very chall...
research
06/07/2022

Improving Image Captioning with Control Signal of Sentence Quality

In the dataset of image captioning, each image is aligned with several c...
research
10/04/2021

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Explaining an image with missing or non-existent objects is known as obj...
research
11/30/2020

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Controllable Image Captioning is a recent sub-field in the multi-modal t...

Please sign up or login with your details

Forgot password? Click here to reset