Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering

03/24/2022
by   Chengyang Fang, et al.
0

Texts in scene images convey critical information for scene understanding and reasoning. The abilities of reading and reasoning matter for the model in the text-based visual question answering (TextVQA) process. However, current TextVQA models do not center on the text and suffer from several limitations. The model is easily dominated by language biases and optical character recognition (OCR) errors due to the absence of semantic guidance in the answer prediction process. In this paper, we propose a novel Semantics-Centered Network (SC-Net) that consists of an instance-level contrastive semantic prediction module (ICSP) and a semantics-centered transformer module (SCT). Equipped with the two modules, the semantics-centered model can resist the language biases and the accumulated errors from OCR. Extensive experiments on TextVQA and ST-VQA datasets show the effectiveness of our model. SC-Net surpasses previous works with a noticeable margin and is more reasonable for the TextVQA task.

READ FULL TEXT

page 1

page 3

page 8

page 9

page 10

research
10/24/2020

RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering

Text-based visual question answering (VQA) requires to read and understa...
research
10/06/2020

Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering

Image text carries essential information to understand the scene and per...
research
05/31/2019

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich sema...
research
12/16/2022

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

Most TextVQA approaches focus on the integration of objects, scene texts...
research
04/04/2023

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

In this paper, we propose a novel multi-modal framework for Scene Text V...
research
08/22/2021

External Knowledge enabled Text Visual Question Answering

The open-ended question answering task of Text-VQA requires reading and ...
research
02/25/2019

GQA: a new dataset for compositional question answering over real-world images

We introduce GQA, a new dataset for real-world visual reasoning and comp...

Please sign up or login with your details

Forgot password? Click here to reset