-
ICDAR 2019 Competition on Scene Text Visual Question Answering
This paper presents final results of ICDAR 2019 Scene Text Visual Questi...
read it
-
Towards VQA Models that can Read
Studies have shown that a dominant class of questions asked by visually ...
read it
-
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Many visual scenes contain text that carries crucial information, and it...
read it
-
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Text-based visual question answering (VQA) requires to read and understa...
read it
-
Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Chart question answering (CQA) is a newly proposed visual question answe...
read it
-
Data Interpretation over Plots
Reasoning over plots by question answering (QA) is a challenging machine...
read it
-
Structured Multimodal Attentions for TextVQA
Text based Visual Question Answering (TextVQA) is a recently raised chal...
read it
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Image text carries essential information to understand the scene and perform reasoning. Text-based visual question answering (text VQA) task focuses on visual questions that require reading text in images. Existing text VQA systems generate an answer by selecting from optical character recognition (OCR) texts or a fixed vocabulary. Positional information of text is underused and there is a lack of evidence for the generated answer. As such, this paper proposes a localization-aware answer prediction network (LaAP-Net) to address this challenge. Our LaAP-Net not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer. Moreover, a context-enriched OCR representation (COR) for multimodal fusion is proposed to facilitate the localization task. Our proposed LaAP-Net outperforms existing approaches on three benchmark datasets for the text VQA task by a noticeable margin.
READ FULL TEXT
Comments
There are no comments yet.