A Graphical Approach to Document Layout Analysis

08/03/2023
by   Jilin Wang, et al.
0

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2019

PubLayNet: largest dataset ever for document layout analysis

Recognizing the layout of unstructured digital documents is an important...
research
08/22/2022

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Recognizing the layout of unstructured digital documents is crucial when...
research
05/26/2023

GVdoc: Graph-based Visual Document Classification

The robustness of a model for real-world deployment is decided by how we...
research
07/16/2021

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Large, pre-trained transformer models like BERT have achieved state-of-t...
research
05/24/2023

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

Transforming documents into machine-processable representations is a cha...
research
04/24/2023

PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis

Document layout analysis has a wide range of requirements across various...

Please sign up or login with your details

Forgot password? Click here to reset