Towards Complex Document Understanding By Discrete Reasoning

07/25/2022
by   Fengbin Zhu, et al.
4

Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language, which is an emerging research topic for both Natural Language Processing and Computer Vision. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs by extending the TAT-QA dataset. These documents are sampled from real-world financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer questions on this dataset. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. Extensive experiments show that the MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of expert humans. We expect that our new TAT-DQA dataset would facilitate the research on deep understanding of visually-rich documents combining vision and language, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Document-based Visual Question Answering examines the document understan...
research
05/03/2023

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents with Semantic-Oriented Hierarchical Graphs

Discrete reasoning over table-text documents (e.g., financial reports) g...
research
07/31/2023

Workshop on Document Intelligence Understanding

Document understanding and information extraction include different task...
research
11/07/2021

Information Extraction from Visually Rich Documents with Font Style Embeddings

Information extraction (IE) from documents is an intensive area of resea...
research
05/15/2023

Document Understanding Dataset and Evaluation (DUDE)

We call on the Document AI (DocAI) community to reevaluate current metho...
research
04/26/2021

InfographicVQA

Infographics are documents designed to effectively communicate informati...
research
10/04/2016

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Visio...

Please sign up or login with your details

Forgot password? Click here to reset