Text Line Segmentation of Historical Documents: a Survey

04/10/2007
by   Laurence Likforman-Sulem, et al.
0

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.

READ FULL TEXT
research
05/14/2019

A human-inspired recognition system for premodern Japanese historical documents

Recognition of historical documents is a challenging problem due to the ...
research
03/23/2022

Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods

Text line segmentation is one of the key steps in historical document un...
research
12/15/2020

docExtractor: An off-the-shelf historical document element extraction

We present docExtractor, a generic approach for extracting visual elemen...
research
02/20/2020

Processing topical queries on images of historical newspaper pages

Historical newspapers are a source of research for the human and social ...
research
09/27/2016

Semi Automatic Color Segmentation of Document Pages

-This paper presents a semi automatic method used to segment color docum...
research
09/04/2023

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

In this paper, we present a pipeline for image extraction from historica...
research
03/16/2021

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

Text line segmentation is one of the pre-stages of modern optical charac...

Please sign up or login with your details

Forgot password? Click here to reset