Analyzing the Behavior of Visual Question Answering Models

06/23/2016
by   Aishwarya Agrawal, et al.
0

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA). The performance of most models is clustered around 60-70 the behavior of these models as a first step towards recognizing their strengths and weaknesses, and identifying the most fruitful directions for progress. We analyze two models, one each from two major classes of VQA models -- with-attention and without-attention and show the similarities and differences in the behavior of these models. We also analyze the winning entry of the VQA Challenge 2016. Our behavior analysis reveals that despite recent progress, today's VQA models are "myopic" (tend to fail on sufficiently novel instances), often "jump to conclusions" (converge on a predicted answer after 'listening' to just half the question), and are "stubborn" (do not change their answers across images).

READ FULL TEXT

page 10

page 11

page 12

page 13

research
04/26/2017

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

Visual Question Answering (VQA) has received a lot of attention over the...
research
08/27/2019

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

The Visual Question Answering (VQA) task combines challenges for process...
research
11/19/2018

Explicit Bias Discovery in Visual Question Answering Models

Researchers have observed that Visual Question Answering (VQA) models te...
research
05/07/2023

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

In recent years, visual question answering (VQA) has attracted attention...
research
12/19/2019

Deep Exemplar Networks for VQA and VQG

In this paper, we consider the problem of solving semantic tasks such as...
research
04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...
research
08/17/2019

What is needed for simple spatial language capabilities in VQA?

Visual question answering (VQA) comprises a variety of language capabili...

Please sign up or login with your details

Forgot password? Click here to reset