Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention

02/15/2019
by   Shalini Ghosh, et al.
0

In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), i.e., generating natural language (NL) explanations for the Visual Question Answering (VQA) problem. We generate NL explanations comprising of the evidence to support the answer to a question asked to an image using two sources of information: (a) annotations of entities in an image (e.g., object labels, region descriptions, relation phrases) generated from the scene graph of the image, and (b) the attention map generated by a VQA model when answering the question. We show how combining the visual attention map with the NL representation of relevant scene graph entities, carefully selected using a language model, can give reasonable textual explanations without the need of any additional collected data (explanation captions, etc). We run our algorithms on the Visual Genome (VG) dataset and conduct internal user-studies to demonstrate the efficacy of our approach over a strong baseline. We have also released a live web demo showcasing our VQA and textual explanation generation using scene graphs and visual attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
04/05/2022

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Providing explanations in the context of Visual Question Answering (VQA)...
research
03/08/2023

Interpretable Visual Question Answering Referring to Outside Knowledge

We present a novel multimodal interpretable VQA model that can answer th...
research
06/03/2019

Generating Question Relevant Captions to Aid Visual Question Answering

Visual question answering (VQA) and image captioning require a shared bo...
research
01/27/2018

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Visual Question Answering (VQA) has attracted attention from both comput...
research
06/14/2019

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as o...
research
01/22/2021

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images a...
research
01/23/2020

Robust Explanations for Visual Question Answering

In this paper, we propose a method to obtain robust explanations for vis...

Please sign up or login with your details

Forgot password? Click here to reset