Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering

03/06/2022
by   Mingxiao Li, et al.
0

Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image. It is not only a more challenging task than regular VQA but also a vital step towards building a general VQA system. Most existing knowledge-based VQA systems process knowledge and image information similarly and ignore the fact that the knowledge base (KB) contains complete information about a triplet, while the extracted image information might be incomplete as the relations between two objects are missing or wrongly detected. In this paper, we propose a novel model named dynamic knowledge memory enhanced multi-step graph reasoning (DMMGR), which performs explicit and implicit reasoning over a key-value knowledge memory module and a spatial-aware image graph, respectively. Specifically, the memory module learns a dynamic knowledge representation and generates a knowledge-aware question representation at each reasoning step. Then, this representation is used to guide a graph attention operator over the spatial-aware image graph. Our model achieves new state-of-the-art accuracy on the KRVQR and FVQA datasets. We also conduct ablation experiments to prove the effectiveness of each component of the proposed model.

READ FULL TEXT

page 3

page 4

page 7

page 8

research
07/27/2020

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Visual question answering (VQA) is a challenging multi-modal task that r...
research
04/30/2020

Dynamic Language Binding in Relational Visual Reasoning

We present Language-binding Object Graph Network, the first neural reaso...
research
12/31/2020

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

Fact-based Visual Question Answering (FVQA), a challenging variant of VQ...
research
06/19/2018

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combinati...
research
11/29/2021

LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering

Video Question Answering (VideoQA), aiming to correctly answer the given...
research
02/11/2023

Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning

Mathematical reasoning is one of the crucial abilities of general artifi...
research
07/22/2023

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

To contribute to automating the medical vision-language model, we propos...

Please sign up or login with your details

Forgot password? Click here to reset