Attention Correctness in Neural Image Captioning

05/31/2016
by   Chenxi Liu, et al.
0

Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively by visualization of several examples. In this paper we focus on evaluating and improving the correctness of attention in neural image captioning models. Specifically, we propose a quantitative evaluation metric for the consistency between the generated attention maps and human annotations, using recently released datasets with alignment between regions in images and entities in captions. We then propose novel models with different levels of explicit supervision for learning attention maps during training. The supervision can be strong when alignment between regions and caption entities are available, or weak when only object segments and categories are provided. We show on the popular Flickr30k and COCO datasets that introducing supervision of attention maps during training solidly improves both attention correctness and caption quality, showing the promise of making machine perception more human-like.

READ FULL TEXT

page 4

page 6

research
03/26/2020

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models

Image captioning models have been able to generate grammatically correct...
research
03/06/2019

Image captioning with weakly-supervised attention penalty

Stories are essential for genealogy research since they can help build e...
research
10/19/2022

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning

Recently, attention based models have been used extensively in many sequ...
research
04/10/2023

ImageCaptioner^2: Image Captioner for Image Captioning Bias Amplification Assessment

Most pre-trained learning systems are known to suffer from bias, which t...
research
06/07/2019

Figure Captioning with Reasoning and Sequence-Level Training

Figures, such as bar charts, pie charts, and line plots, are widely used...
research
06/12/2015

Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge ...
research
03/06/2019

A Synchronized Multi-Modal Attention-Caption Dataset and Analysis

In this work, we present a novel multi-modal dataset consisting of eye m...

Please sign up or login with your details

Forgot password? Click here to reset