VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

08/12/2021
by   Shoubin Li, et al.
0

Documents often contain complex physical structures, which make the Document Layout Analysis (DLA) task challenging. As a pre-processing step for content extraction, DLA has the potential to capture rich information in historical or scientific documents on a large scale. Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting Figure from documents, they are still unsatisfactory in recognizing the List, Table, Text and Title category blocks in DLA. This paper proposes a VTLayout model fusing the documents' deep visual, shallow visual, and text features to localize and identify different category blocks. The model mainly includes two stages, and the three feature extractors are built in the second stage. In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents. In the second stage, the deep visual, shallow visual, and text features are extracted for fusion to identify the category blocks of documents. As a result, we strengthen the classification power of different category blocks based on the existing localization technique. The experimental results show that the identification capability of the VTLayout is superior to the most advanced method of DLA based on the PubLayNet dataset, and the F1 score is as high as 0.9599.

READ FULL TEXT
research
04/18/2020

A Large Dataset of Historical Japanese Documents with Complex Layouts

Deep learning-based approaches for automatic document layout analysis an...
research
05/13/2021

VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations

Document layout analysis is crucial for understanding document structure...
research
08/26/2023

Bengali Document Layout Analysis with Detectron2

Document digitization is vital for preserving historical records, effici...
research
08/10/2021

BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

Understanding documents from their visual snapshots is an emerging probl...
research
07/15/2019

Multimodal deep networks for text and image-based document classification

Classification of document images is a critical step for archival of old...
research
10/22/2018

Baseline Detection in Historical Documents using Convolutional U-Nets

Baseline detection is still a challenging task for heterogeneous collect...
research
03/03/2013

Genetic Programming for Document Segmentation and Region Classification Using Discipulus

Document segmentation is a method of rending the document into distinct ...

Please sign up or login with your details

Forgot password? Click here to reset