Interpretable Visual Question Answering via Reasoning Supervision

09/07/2023
by   Maria Parelli, et al.
0

Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.

READ FULL TEXT

page 1

page 4

research
08/01/2018

Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining

A key aspect of VQA models that are interpretable is their ability to gr...
research
05/25/2022

Guiding Visual Question Answering with Attention Priors

The current success of modern visual reasoning systems is arguably attri...
research
12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...
research
02/21/2015

Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks

Artificial agents today can answer factual questions. But they fall shor...
research
10/31/2019

TAB-VCR: Tags and Attributes based VCR Baselines

Reasoning is an important ability that we learn from a very early age. Y...
research
04/20/2022

Attention in Reasoning: Dataset, Analysis, and Modeling

While attention has been an increasingly popular component in deep neura...
research
09/06/2018

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Collaborative reasoning for understanding each image-question pair is ve...

Please sign up or login with your details

Forgot password? Click here to reset