VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

03/20/2018
by   Qing Li, et al.
0

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

READ FULL TEXT

page 2

page 5

page 7

page 14

research
01/25/2023

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Providing explanations for visual question answering (VQA) has gained mu...
research
01/23/2020

Robust Explanations for Visual Question Answering

In this paper, we propose a method to obtain robust explanations for vis...
research
01/27/2018

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Visual Question Answering (VQA) has attracted attention from both comput...
research
11/09/2022

Towards Reasoning-Aware Explainable VQA

The domain of joint vision-language understanding, especially in the con...
research
06/22/2022

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Many past works aim to improve visual reasoning in models by supervising...
research
03/08/2023

Interpretable Visual Question Answering Referring to Outside Knowledge

We present a novel multimodal interpretable VQA model that can answer th...
research
03/11/2022

REX: Reasoning-aware and Grounded Explanation

Effectiveness and interpretability are two essential properties for trus...

Please sign up or login with your details

Forgot password? Click here to reset