DocVQA: A Dataset for VQA on Document Images

07/01/2020
by   Minesh Mathew, et al.
24

We present a new dataset for Visual Question Answering on document images called DocVQA. The dataset consistsof 50,000 questions defined on 12,000+ document images. We provide detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36 models need to improve specifically on questions where understanding structure of the document is crucial.

READ FULL TEXT

page 5

page 12

page 13

page 18

page 19

page 20

page 22

page 23

research
11/10/2021

ICDAR 2021 Competition on Document VisualQuestion Answering

In this report we present results of the ICDAR 2021 edition of the Docum...
research
01/27/2021

VisualMRC: Machine Reading Comprehension on Document Images

Recent studies on machine reading comprehension have focused on text-lev...
research
11/29/2018

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of bo...
research
11/01/2021

Introspective Distillation for Robust Question Answering

Question answering (QA) models are well-known to exploit data bias, e.g....
research
05/23/2023

DUBLIN – Document Understanding By Language-Image Network

Visual document understanding is a complex task that involves analyzing ...
research
05/13/2020

BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. C...
research
06/09/2018

Learning to Search in Long Documents Using Document Structure

Reading comprehension models are based on recurrent neural networks that...

Please sign up or login with your details

Forgot password? Click here to reset