VisualMRC: Machine Reading Comprehension on Document Images

01/27/2021
by   Ryota Tanaka, et al.
5

Recent studies on machine reading comprehension have focused on text-level understanding but have not yet reached the level of human understanding of the visual layout and content of real-world documents. In this study, we introduce a new visual machine reading comprehension dataset, named VisualMRC, wherein given a question and a document image, a machine reads and comprehends texts in the image to answer the question in natural language. Compared with existing visual question answering (VQA) datasets that contain texts in images, VisualMRC focuses more on developing natural language understanding and generation abilities. It contains 30,000+ pairs of a question and an abstractive answer for 10,000+ document images sourced from multiple domains of webpages. We also introduce a new model that extends existing sequence-to-sequence models, pre-trained with large-scale text corpora, to take into account the visual layout and content of documents. Experiments with VisualMRC show that this model outperformed the base sequence-to-sequence models and a state-of-the-art VQA model. However, its performance is still below that of humans on most automatic evaluation metrics. The dataset will facilitate research aimed at connecting vision and language understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2020

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering on document image...
research
05/24/2023

Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

Advances in Large Language Models (LLMs) have inspired a surge of resear...
research
07/29/2021

Break, Perturb, Build: Automatic Perturbation of Reasoning Paths through Question Decomposition

Recent efforts to create challenge benchmarks that test the abilities of...
research
01/13/2022

ChartText: Linking Text with Charts in Documents

Recent works show that interactive documents connecting text with visual...
research
09/16/2022

ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots

We present a new task and dataset, ScreenQA, for screen content understa...
research
02/09/2022

FedQAS: Privacy-aware machine reading comprehension with federated learning

Machine reading comprehension (MRC) of text data is one important task i...
research
06/05/2017

A Joint Model for Question Answering and Question Generation

We propose a generative machine comprehension model that learns jointly ...

Please sign up or login with your details

Forgot password? Click here to reset