VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

06/22/2022
by   Zhuofan Ying, et al.
8

Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that these methods do not meaningfully align model FI with human FI. In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility). Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets in terms of both in-distribution and out-of-distribution accuracy. While past work suggests that the mechanism for improved accuracy is through improved explanation plausibility, we show that this relationship depends crucially on explanation faithfulness (whether explanations truly represent the model's internal reasoning). Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful. Lastly, we show that, surprisingly, RRR metrics are not predictive of out-of-distribution model accuracy when controlling for a model's in-distribution accuracy, which calls into question the value of these metrics for evaluating model reasoning. All supporting code is available at https://github.com/zfying/visfis

READ FULL TEXT

page 2

page 9

research
03/20/2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Most existing works in visual question answering (VQA) are dedicated to ...
research
01/23/2020

Robust Explanations for Visual Question Answering

In this paper, we propose a method to obtain robust explanations for vis...
research
09/08/2018

Faithful Multimodal Explanation for Visual Question Answering

AI systems' ability to explain their reasoning is critical to their util...
research
11/09/2022

Towards Reasoning-Aware Explainable VQA

The domain of joint vision-language understanding, especially in the con...
research
11/19/2019

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

In this paper, we aim to obtain improved attention for a visual question...
research
04/29/2020

Towards Transparent and Explainable Attention Models

Recent studies on interpretability of attention distributions have led t...
research
12/24/2020

To what extent do human explanations of model behavior align with actual model behavior?

Given the increasingly prominent role NLP models (will) play in our live...

Please sign up or login with your details

Forgot password? Click here to reset