Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

10/15/2021
by   Penghai Zhao, et al.
0

Accurate layout analysis without subsequent text-line segmentation remains an ongoing challenge, especially when facing the Kangyur, a kind of historical Tibetan document featuring considerable touching components and mottled background. Aiming at identifying different regions in document images, layout analysis is indispensable for subsequent procedures such as character recognition. However, there was only a little research being carried out to perform line-level layout analysis which failed to deal with the Kangyur. To obtain the optimal results, a fine-grained sub-line level layout analysis approach is presented. Firstly, we introduced an accelerated method to build the dataset which is dynamic and reliable. Secondly, enhancement had been made to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the enhanced SOLOv2 with the prepared annotation file during the training phase. Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified during the inference stage. The experimental results show that the proposed method delivers a decent 72.7 dataset. In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches. We also believe that the proposed methods can be adopted on other language documents with various layouts.

READ FULL TEXT

page 2

page 4

page 7

page 9

page 12

page 13

research
11/27/2021

Document Layout Analysis with Aesthetic-Guided Image Augmentation

Document layout analysis (DLA) plays an important role in information ex...
research
06/01/2020

DocBank: A Benchmark Dataset for Document Layout Analysis

Document layout analysis usually relies on computer vision models to und...
research
07/05/2023

Line Graphics Digitization: A Step Towards Full Automation

The digitization of documents allows for wider accessibility and reprodu...
research
08/21/2021

Palmira: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts

Handwritten documents are often characterized by dense and uneven layout...
research
01/24/2022

Importance of Textlines in Historical Document Classification

This paper describes a system prepared at Brno University of Technology ...
research
12/15/2019

Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

Historical palm-leaf manuscript and early paper documents from Indian su...
research
07/09/2019

BADAM: A Public Dataset for Baseline Detection in Arabic-script Manuscripts

The application of handwritten text recognition to historical works is h...

Please sign up or login with your details

Forgot password? Click here to reset