ICDAR 2019 Competition on Scene Text Visual Question Answering

06/30/2019
by   Ali Furkan Biten, et al.
0

This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23,038 images annotated with 31,791 question/answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding.

READ FULL TEXT
research
05/31/2019

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich sema...
research
10/06/2020

Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering

Image text carries essential information to understand the scene and per...
research
08/01/2023

Making the V in Text-VQA Matter

Text-based VQA aims at answering questions by reading the text present i...
research
11/15/2022

Visually Grounded VQA by Lattice-based Retrieval

Visual Grounding (VG) in Visual Question Answering (VQA) systems describ...
research
08/20/2021

Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling

As an important task in multimodal context understanding, Text-VQA (Visu...
research
02/24/2020

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Visual Question Answering (VQA) methods have made incredible progress, b...
research
04/18/2019

Towards VQA Models that can Read

Studies have shown that a dominant class of questions asked by visually ...

Please sign up or login with your details

Forgot password? Click here to reset