From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

08/08/2019
by   Difei Gao, et al.
4

Visual Question Answering (VQA) is a challenging task for evaluating the ability of comprehensive understanding of the world. Existing benchmarks usually focus on the reasoning abilities either only on the vision or mainly on the knowledge with relatively simple abilities on vision. However, the ability of answering a question that requires alternatively inferring on the image content and the commonsense knowledge is crucial for an advanced VQA system. In this paper, we introduce a VQA dataset that provides more challenging and general questions about Compositional Reasoning on vIsion and Commonsense, which is named as CRIC. To create this dataset, we develop a powerful method to automatically generate compositional questions and rich annotations from both the scene graph of a given image and some external knowledge graph. Moreover, this paper presents a new compositional model that is capable of implementing various types of reasoning functions on the image content and the knowledge graph. Further, we analyze several baselines, state-of-the-art and our model on CRIC dataset. The experimental results show that the proposed task is challenging, where state-of-the-art obtains 52.26 obtains 58.38

READ FULL TEXT

page 1

page 4

page 5

page 6

page 8

page 13

page 14

research
10/24/2022

VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

There has been a growing interest in solving Visual Question Answering (...
research
12/22/2021

CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

3D scene understanding is a relatively emerging research field. In this ...
research
06/25/2022

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

In order to achieve a general visual question answering (VQA) system, it...
research
04/04/2020

Generating Rationales in Visual Question Answering

Despite recent advances in Visual QuestionAnswering (VQA), it remains a ...
research
04/08/2020

Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

Visual Question Answering (VQA) systems are tasked with answering natura...
research
09/23/2019

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

Explanation and high-order reasoning capabilities are crucial for real-w...
research
03/15/2022

Can you even tell left from right? Presenting a new challenge for VQA

Visual Question Answering (VQA) needs a means of evaluating the strength...

Please sign up or login with your details

Forgot password? Click here to reset