Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

04/08/2020
by   Goonmeet Bajaj, et al.
6

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Current VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need for improving the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases and to allow for improved visual understanding. However, it is unclear as to whether there are any latent patterns that can be used to quantify and explain these failures. To better quantify our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to identify/tag questions with one or more types of KGs. Each KG describes the reasoning abilities needed to arrive at a resolution, and failure to resolve gaps indicate an absence of the required reasoning ability. After identifying KGs for each question, we examine the skew in the distribution of the number of questions for each KG. In order to reduce the skew in the distribution of questions across KGs, we introduce a targeted question generation model. This model allows us to generate new types of questions for an image.

READ FULL TEXT

page 4

page 11

research
06/08/2023

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem ...
research
10/26/2022

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

In visual question answering (VQA), a machine must answer a question giv...
research
07/16/2019

2nd Place Solution to the GQA Challenge 2019

We present a simple method that achieves unexpectedly superior performan...
research
08/08/2019

From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

Visual Question Answering (VQA) is a challenging task for evaluating the...
research
08/17/2019

What is needed for simple spatial language capabilities in VQA?

Visual question answering (VQA) comprises a variety of language capabili...
research
01/20/2020

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Existing VQA datasets contain questions with varying levels of complexit...
research
06/27/2022

Consistency-preserving Visual Question Answering in Medical Imaging

Visual Question Answering (VQA) models take an image and a natural-langu...

Please sign up or login with your details

Forgot password? Click here to reset