Generating Rationales in Visual Question Answering

04/04/2020
by   Hammad A. Ayyubi, et al.
0

Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics

READ FULL TEXT
research
10/24/2020

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Visual Question Answering is a multi-modal task that aims to measure hig...
research
06/20/2020

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Visual reasoning tasks such as visual question answering (VQA) require a...
research
08/08/2019

From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

Visual Question Answering (VQA) is a challenging task for evaluating the...
research
10/21/2019

Enforcing Reasoning in Visual Commonsense Reasoning

The task of Visual Commonsense Reasoning is extremely challenging in the...
research
02/25/2022

Joint Answering and Explanation for Visual Commonsense Reasoning

Visual Commonsense Reasoning (VCR), deemed as one challenging extension ...
research
06/14/2023

Improving Selective Visual Question Answering by Learning from Your Peers

Despite advances in Visual Question Answering (VQA), the ability of mode...
research
08/15/2017

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Rich and dense human labeled datasets are among the main enabling factor...

Please sign up or login with your details

Forgot password? Click here to reset