Improving VQA and its Explanations by Comparing Competing Explanations

06/28/2020
by   Jialin Wu, et al.
0

Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer distribution given the question and visual content. As a result, these systems frequently take shortcuts, focusing on simple visual concepts or question priors. This phenomenon becomes more problematic as the questions become complex that requires more reasoning and commonsense knowledge. To address this issue, we present a novel framework that uses explanations for competing answers to help VQA systems select the correct answer. By training on human textual explanations, our framework builds better representations for the questions and visual content, and then reweights confidences in the answer candidates using either generated or retrieved explanations from the training set. We evaluate our framework on the VQA-X dataset, which has more difficult questions with human explanations, achieving new state-of-the-art results on both VQA and its explanations.

READ FULL TEXT
research
05/24/2019

Self-Critical Reasoning for Robust Visual Question Answering

Visual Question Answering (VQA) deep-learning systems tend to capture su...
research
09/08/2018

Faithful Multimodal Explanation for Visual Question Answering

AI systems' ability to explain their reasoning is critical to their util...
research
11/20/2018

VQA with no questions-answers training

Methods for teaching machines to answer visual questions have made signi...
research
11/30/2018

From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

Current Visual Question Answering (VQA) systems can answer intelligent q...
research
01/23/2020

Robust Explanations for Visual Question Answering

In this paper, we propose a method to obtain robust explanations for vis...
research
06/13/2018

Learning Visual Knowledge Memory Networks for Visual Question Answering

Visual question answering (VQA) requires joint comprehension of images a...
research
10/29/2018

Do Explanations make VQA Models more Predictable to a Human?

A rich line of research attempts to make deep neural networks more trans...

Please sign up or login with your details

Forgot password? Click here to reset