Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

11/01/2018
by   Medhini Narasimhan, et al.
10

Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer. We observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. Instead, we develop an entity graph and use a graph convolutional network to `reason' about the correct answer by jointly considering all entities. We show on the challenging FVQA dataset that this leads to an improvement in accuracy of around 7

READ FULL TEXT

page 2

page 8

page 9

research
08/12/2022

Forecasting Question Answering over Temporal Knowledge Graphs

Question answering over temporal knowledge graphs (TKGQA) has recently f...
research
06/16/2020

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

Fact-based Visual Question Answering (FVQA) requires external knowledge ...
research
12/31/2020

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

Fact-based Visual Question Answering (FVQA), a challenging variant of VQ...
research
08/17/2021

Fact-Tree Reasoning for N-ary Question Answering over Knowledge Graphs

In the question answering(QA) task, multi-hop reasoning framework has be...
research
01/29/2020

MEMO: A Deep Network for Flexible Combination of Episodic Memories

Recent research developing neural network architectures with external me...
research
03/09/2021

Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering

Multimodal IR, spanning text corpus, knowledge graph and images, called ...
research
06/16/2020

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering

Fact-based Visual Question Answering (FVQA) requires external knowledge ...

Please sign up or login with your details

Forgot password? Click here to reset