Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

11/19/2019
by   Badri N. Patro, et al.
0

In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is challenging to provide supervision for attention. An observation we make is that visual explanations as obtained through class activation mappings (specifically Grad-CAM) that are meant to explain the performance of various networks could form a means of supervision. However, as the distributions of attention maps and that of Grad-CAMs differ, it would not be suitable to directly use these as a form of supervision. Rather, we propose the use of a discriminator that aims to distinguish samples of visual explanation and attention maps. The use of adversarial training of the attention regions as a two-player game between attention and explanation serves to bring the distributions of attention maps and visual explanations closer. Significantly, we observe that providing such a means of supervision also results in attention maps that are more closely related to human attention resulting in a substantial improvement over baseline stacked attention network (SAN) models. It also results in a good improvement in rank correlation metric on the VQA task. This method can also be combined with recent MCB based methods and results in consistent improvement. We also provide comparisons with other means for learning distributions such as based on Correlation Alignment (Coral), Maximum Mean Discrepancy (MMD) and Mean Square Error (MSE) losses and observe that the adversarial loss outperforms the other forms of learning the attention maps. Visualization of the results also confirms our hypothesis that attention maps improve using this form of supervision.

READ FULL TEXT

page 6

page 7

research
09/19/2017

Exploring Human-like Attention Supervision in Visual Question Answering

Attention mechanisms have been widely applied in the Visual Question Ans...
research
08/17/2019

U-CAM: Visual Explanation using Uncertainty based Class Activation Maps

Understanding and explaining deep learning models is an imperative task....
research
01/25/2023

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Providing explanations for visual question answering (VQA) has gained mu...
research
06/22/2022

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Many past works aim to improve visual reasoning in models by supervising...
research
03/01/2020

A Study on Multimodal and Interactive Explanations for Visual Question Answering

Explainability and interpretability of AI models is an essential factor ...
research
02/15/2018

Learning to Count Objects in Natural Images for Visual Question Answering

Visual Question Answering (VQA) models have struggled with counting obje...
research
08/10/2021

Understanding Character Recognition using Visual Explanations Derived from the Human Visual System and Deep Networks

Human observers engage in selective information uptake when classifying ...

Please sign up or login with your details

Forgot password? Click here to reset