Graph Reasoning Networks for Visual Question Answering

07/23/2019
by   Dalu Guo, et al.
4

The interaction between language and visual information has been emphasized in visual question answering (VQA) with the help of attention mechanism. However, the relationship between words in question has been underestimated, which makes it hard to answer questions that involve the relationship between multiple entities, such as comparison and counting. In this paper, we develop the graph reasoning networks to tackle this problem. Two kinds of graphs are investigated, namely inter-graph and intra-graph. The inter-graph transfers features of the detected objects to their related query words, enabling the output nodes to have both semantic and factual information. The intra-graph exchanges information between these output nodes from inter-graph to amplify implicit yet important relationship between objects. These two kinds of graphs cooperate with each other, and thus our resulting model can reason the relationship and dependence between objects, which leads to realization of multi-step reasoning. Experimental results on the GQA v1.1 dataset demonstrate the reasoning ability of our method to handle compositional questions about real-world images. We achieve state-of-the-art performance, boosting accuracy to 57.04 overall accuracy, especially on counting problem.

READ FULL TEXT

page 8

page 9

research
07/27/2020

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Visual question answering (VQA) is a challenging multi-modal task that r...
research
12/22/2021

CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

3D scene understanding is a relatively emerging research field. In this ...
research
09/06/2021

Improving Numerical Reasoning Skills in the Modular Approach for Complex Question Answering on Text

Numerical reasoning skills are essential for complex question answering ...
research
10/20/2020

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Human-object interaction(HOI) detection is an important task for underst...
research
03/18/2023

Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

Humans have the innate capability to answer diverse questions, which is ...
research
10/29/2018

TallyQA: Answering Complex Counting Questions

Most counting questions in visual question answering (VQA) datasets are ...
research
04/24/2020

Revisiting Modulated Convolutions for Visual Counting and Beyond

This paper targets at visual counting, where the setup is to estimate th...

Please sign up or login with your details

Forgot password? Click here to reset