Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

12/13/2021
by   Diego Garcia-Olano, et al.
0

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, we empirically study how and whether such methods, applied in a bi-modal setting, can improve an existing VQA system's performance on the KBVQA task. We experiment with two large publicly available VQA datasets, (1) KVQA which contains mostly rare Wikipedia entities and (2) OKVQA which is less entity-centric and more aligned with common sense reasoning. Both lack explicit entity spans and we study the effect of different weakly supervised and manual methods for obtaining them. Additionally we analyze how recently proposed bi-modal and single modal attention explanations are affected by the incorporation of such entity enhanced representations. Our results show substantial improved performance on the KBVQA task without the need for additional costly pre-training and we provide insights for when entity knowledge injection helps improve a model's understanding. We provide code and enhanced datasets for reproducibility.

READ FULL TEXT

page 3

page 10

page 11

page 12

research
07/26/2022

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Visual question answering (VQA) often requires an understanding of visua...
research
01/05/2022

Does entity abstraction help generative Transformers reason?

Pre-trained language models (LMs) often struggle to reason logically or ...
research
08/22/2021

External Knowledge enabled Text Visual Question Answering

The open-ended question answering task of Text-VQA requires reading and ...
research
03/05/2023

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

The ideal form of Visual Question Answering requires understanding, grou...
research
01/15/2021

Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

The limits of applicability of vision-and-language models are defined by...
research
09/01/2021

Does Knowledge Help General NLU? An Empirical Study

It is often observed in knowledge-centric tasks (e.g., common sense ques...
research
09/20/2023

Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering

Despite their competitive performance on knowledge-intensive tasks, larg...

Please sign up or login with your details

Forgot password? Click here to reset