Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining

08/01/2018
by   Yundong Zhang, et al.
0

A key aspect of VQA models that are interpretable is their ability to ground their answers to relevant regions in the image. Current approaches with this capability rely on supervised learning and human annotated groundings to train attention mechanisms inside the VQA architecture. Unfortunately, obtaining human annotations specific for visual grounding is difficult and expensive. In this work, we demonstrate that we can effectively train a VQA architecture with grounding supervision that can be automatically obtained from available region descriptions and object annotations. We also show that our model trained with this mined supervision generates visual groundings that achieve a higher correlation with respect to manually-annotated groundings, meanwhile achieving state-of-the-art VQA accuracy.

READ FULL TEXT
research
02/04/2022

Grounding Answers for Visual Questions Asked by Visually Impaired People

Visual question answering is the task of answering questions about image...
research
09/07/2023

Interpretable Visual Question Answering via Reasoning Supervision

Transformer-based architectures have recently demonstrated remarkable pe...
research
07/05/2022

Weakly Supervised Grounding for VQA in Vision-Language Transformers

Transformers for visual-language representation learning have been getti...
research
05/25/2022

Guiding Visual Question Answering with Attention Priors

The current success of modern visual reasoning systems is arguably attri...
research
05/11/2021

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

The problem of grounding VQA tasks has seen an increased attention in th...
research
06/30/2022

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

We propose a margin-based loss for vision-language model pretraining tha...
research
04/12/2020

A negative case analysis of visual grounding methods for VQA

Existing Visual Question Answering (VQA) methods tend to exploit dataset...

Please sign up or login with your details

Forgot password? Click here to reset