Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

12/15/2019
by   Abhishek Prusty, et al.
10

Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.

READ FULL TEXT

page 2

page 4

page 5

page 6

research
08/21/2021

Palmira: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts

Handwritten documents are often characterized by dense and uneven layout...
research
04/18/2020

A Large Dataset of Historical Japanese Documents with Complex Layouts

Deep learning-based approaches for automatic document layout analysis an...
research
10/15/2021

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

Accurate layout analysis without subsequent text-line segmentation remai...
research
07/14/2020

Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization

In this paper, we propose an end-to-end trainable framework for restorin...
research
08/21/2021

BoundaryNet: An Attentive Deep Network with Fast Marching Distance Maps for Semi-automatic Layout Annotation

Precise boundary annotations of image regions can be crucial for downstr...
research
09/04/2023

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

In this paper, we present a pipeline for image extraction from historica...

Please sign up or login with your details

Forgot password? Click here to reset