Making the V in Text-VQA Matter

08/01/2023
by   Shamanthak Hegde, et al.
0

Text-based VQA aims at answering questions by reading the text present in the images. It requires a large amount of scene-text relationship understanding compared to the VQA task. Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image but less importance is given to visual features and some questions do not require understanding the image. The models trained on this dataset predict biased answers due to the lack of understanding of visual context. For example, in questions like "What is written on the signboard?", the answer predicted by the model is always "STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions. We further test the model on different datasets and compare their qualitative and quantitative results.

READ FULL TEXT

page 5

page 6

page 7

page 8

research
02/04/2022

Grounding Answers for Visual Questions Asked by Visually Impaired People

Visual question answering is the task of answering questions about image...
research
06/30/2019

ICDAR 2019 Competition on Scene Text Visual Question Answering

This paper presents final results of ICDAR 2019 Scene Text Visual Questi...
research
06/25/2021

A Picture May Be Worth a Hundred Words for Visual Question Answering

How far can we go with textual representations for understanding picture...
research
04/18/2019

Towards VQA Models that can Read

Studies have shown that a dominant class of questions asked by visually ...
research
08/03/2022

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Text-VQA aims at answering questions that require understanding the text...
research
12/02/2016

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant i...
research
12/04/2020

Self-Supervised VQA: Answering Visual Questions using Images and Captions

Methodologies for training VQA models assume the availability of dataset...

Please sign up or login with your details

Forgot password? Click here to reset