External Knowledge enabled Text Visual Question Answering

08/22/2021
by   Arka Ujjal Dey, et al.
4

The open-ended question answering task of Text-VQA requires reading and reasoning about local, often previously unseen, scene-text content of an image to generate answers. In this work, we propose the generalized use of external knowledge to augment our understanding of the said scene-text. We design a framework to extract, validate, and reason with knowledge using a standard multimodal transformer for vision language understanding tasks. Through empirical evidence and qualitative results, we demonstrate how external knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities. We generate results comparable to the state-of-the-art on two publicly available datasets, under the constraints of similar upstream OCR systems and training data.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 9

page 10

research
09/15/2021

Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

Integrating outside knowledge for reasoning in visio-linguistic tasks su...
research
02/09/2022

Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?

The task of Outside Knowledge Visual Question Answering (OKVQA) requires...
research
05/31/2019

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich sema...
research
12/13/2021

Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task req...
research
07/02/2023

Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data

This paper addresses the ethical concerns arising from the use of unauth...
research
02/11/2023

Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Often, deep network models are purely inductive during training and whil...
research
03/24/2022

Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering

Texts in scene images convey critical information for scene understandin...

Please sign up or login with your details

Forgot password? Click here to reset