Quantifying the amount of visual information used by neural caption generators

10/12/2018
by   Marc Tanti, et al.
0

This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2019

Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators

Grounding language to visual relations is critical to various language-a...
research
06/19/2020

Hyperparameter Analysis for Image Captioning

In this paper, we perform a thorough sensitivity analysis on state-of-th...
research
12/08/2018

Attend More Times for Image Captioning

Most attention-based image captioning models attend to the image once pe...
research
02/07/2023

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

This paper presents a Keyword-driven and N-gram Graph based approach for...
research
08/06/2019

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

Image captioning attempts to generate a sentence composed of several lin...
research
04/05/2023

Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models

Heatmaps are widely used to interpret deep neural networks, particularly...

Please sign up or login with your details

Forgot password? Click here to reset