Self-Critical Reasoning for Robust Visual Question Answering

05/24/2019
by   Jialin Wu, et al.
0

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e. 49.5% using textual explanations and 48.5% using automatically annotated regions.

READ FULL TEXT

page 2

page 4

page 8

research
06/28/2020

Improving VQA and its Explanations by Comparing Competing Explanations

Most recent state-of-the-art Visual Question Answering (VQA) systems are...
research
09/18/2020

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

While progress has been made on the visual question answering leaderboar...
research
01/23/2020

Robust Explanations for Visual Question Answering

In this paper, we propose a method to obtain robust explanations for vis...
research
02/03/2021

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Visual attention in Visual Question Answering (VQA) targets at locating ...
research
06/08/2021

Check It Again: Progressive Visual Question Answering via Visual Entailment

While sophisticated Visual Question Answering models have achieved remar...
research
09/08/2018

Faithful Multimodal Explanation for Visual Question Answering

AI systems' ability to explain their reasoning is critical to their util...
research
03/14/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Despite Visual Question Answering (VQA) has realized impressive progress...

Please sign up or login with your details

Forgot password? Click here to reset