DLUE: Benchmarking Document Language Understanding

05/16/2023
by   Ruoxi Xu, et al.
0

Understanding documents is central to many real-world tasks but remains a challenging topic. Unfortunately, there is no well-established consensus on how to comprehensively evaluate document understanding abilities, which significantly hinders the fair comparison and measuring the progress of the field. To benchmark document understanding researches, this paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose Document Language Understanding Evaluation – DLUE, a new task suite which covers a wide-range of tasks in various forms, domains and document genres. We also systematically evaluate six well-established transformer models on DLUE, and find that due to the lengthy content, complicated underlying structure and dispersed knowledge, document understanding is still far from being solved, and currently there is no neural architecture that dominates all tasks, raising requirements for a universal document understanding architecture.

READ FULL TEXT
research
04/13/2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

Document-based Visual Question Answering examines the document understan...
research
04/14/2017

ShapeWorld - A new test methodology for multimodal language understanding

We introduce a novel framework for evaluating multimodal deep learning m...
research
02/21/2017

Systèmes du LIA à DEFT'13

The 2013 Défi de Fouille de Textes (DEFT) campaign is interested in two ...
research
04/04/2023

Form-NLU: Dataset for the Form Language Understanding

Compared to general document analysis tasks, form document structure und...
research
04/28/2023

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

Advances in the Visually-rich Document Understanding (VrDU) field and pa...
research
09/03/2017

Understanding the Logical and Semantic Structure of Large Documents

Current language understanding approaches focus on small documents, such...
research
06/07/2023

RD-Suite: A Benchmark for Ranking Distillation

The distillation of ranking models has become an important topic in both...

Please sign up or login with your details

Forgot password? Click here to reset