Learning Visual Knowledge Memory Networks for Visual Question Answering

06/13/2018
by   Zhou Su, et al.
0

Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions.

READ FULL TEXT

page 1

page 8

research
12/03/2017

Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks

Visual Question Answering (VQA) has attracted much attention since it of...
research
05/24/2018

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Recently, Visual Question Answering (VQA) has emerged as one of the most...
research
11/29/2018

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of bo...
research
06/28/2020

Improving VQA and its Explanations by Comparing Competing Explanations

Most recent state-of-the-art Visual Question Answering (VQA) systems are...
research
07/26/2022

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Visual question answering (VQA) often requires an understanding of visua...
research
01/22/2021

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images a...
research
05/10/2023

Combo of Thinking and Observing for Outside-Knowledge VQA

Outside-knowledge visual question answering is a challenging task that r...

Please sign up or login with your details

Forgot password? Click here to reset