Understanding Image Captioning Models beyond Visualizing Attention

01/04/2020
by   Jiamei Sun, et al.
21

This paper explains predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. In this paper, we develop variants of layer-wise relevance backpropagation (LRP) and gradient backpropagation, tailored to image captioning with attention. The result provides simultaneously pixel-wise image explanation and linguistic explanation for each word in the captions. We show that given a word in the caption to be explained, explanation methods such as LRP reveal supporting and opposing pixels as well as words. We compare the properties of attention heatmaps systematically against those computed with explanation methods such as LRP, Grad-CAM and Guided Grad-CAM. We show that explanation methods, firstly, correlate to object locations with higher precision than attention, secondly, are able to identify object words that are unsupported by image content, and thirdly, provide guidance to debias and improve the model. Results are reported for image captioning using two different attention models trained with Flickr30K and MSCOCO2017 datasets. Experimental analyses show the strength of explanation methods for understanding image captioning attention models.

READ FULL TEXT

page 1

page 4

page 6

page 10

research
11/29/2021

Neural Attention for Image Captioning: Review of Outstanding Methods

Image captioning is the task of automatically generating sentences that ...
research
11/10/2019

Can Neural Image Captioning be Controlled via Forced Attention?

Learned dynamic weighting of the conditioning signal (attention) has bee...
research
12/08/2018

Attend More Times for Image Captioning

Most attention-based image captioning models attend to the image once pe...
research
05/10/2019

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

In this work, we study the robustness of a CNN+RNN based image captionin...
research
07/26/2018

Rethinking the Form of Latent States in Image Captioning

RNNs and their variants have been widely adopted for image captioning. I...
research
01/04/2022

Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety

There is amazing progress in Deep Learning based models for Image captio...

Please sign up or login with your details

Forgot password? Click here to reset