Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

01/25/2023
by   Chenxi Whitehouse, et al.
0

Providing explanations for visual question answering (VQA) has gained much attention in research. However, most existing systems use separate models for predicting answers and providing explanations. We argue that training explanation models independently of the QA model makes the explanations less grounded and limits performance. To address this, we propose a multitask learning approach towards a Unified Model for more grounded and consistent generation of both Answers and Explanations (UMAE). To achieve this, we add artificial prompt tokens to training instances and finetune a multimodal encoder-decoder model on various VQA tasks. In our experiments, UMAE models surpass the prior SOTA answer accuracy on A-OKVQA by 10 15 results on OK-VQA, achieve new SOTA explanation scores on A-OKVQA and VCR, and demonstrate promising out-of-domain performance on VQA-X.

READ FULL TEXT

page 2

page 4

page 11

page 12

page 13

research
03/20/2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Most existing works in visual question answering (VQA) are dedicated to ...
research
04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...
research
10/29/2018

Do Explanations make VQA Models more Predictable to a Human?

A rich line of research attempts to make deep neural networks more trans...
research
09/08/2018

Faithful Multimodal Explanation for Visual Question Answering

AI systems' ability to explain their reasoning is critical to their util...
research
01/31/2020

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural La...
research
03/01/2020

A Study on Multimodal and Interactive Explanations for Visual Question Answering

Explainability and interpretability of AI models is an essential factor ...
research
11/19/2019

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

In this paper, we aim to obtain improved attention for a visual question...

Please sign up or login with your details

Forgot password? Click here to reset