Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends

02/15/2020
by   James P. Philips, et al.
0

Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including computer vision, document analysis and recognition, natural language processing, and machine learning, to convert images of ancient manuscripts, letters, diaries, and early printed texts automatically into a digital format usable in data mining and information retrieval systems. Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text from these collections has become acute. Since Historical Document Processing encompasses multiple sub-domains of computer science, knowledge relevant to its purpose is scattered across numerous journals and conference proceedings. This paper surveys the major phases of, standard algorithms, tools, and datasets in the field of Historical Document Processing, discusses the results of a literature review, and finally suggests directions for further research.

READ FULL TEXT

page 5

page 6

research
11/16/2021

Document AI: Benchmarks, Models and Applications

Document AI, or Document Intelligence, is a relatively new research topi...
research
03/16/2022

A Survey of Historical Document Image Datasets

This paper presents a systematic literature review of image datasets for...
research
01/08/2020

Techniques d'anonymisation tabulaire : concepts et mise en oeuvre

In this document, we present a state of the art of anonymization techniq...
research
12/28/2021

Intelligent Document Processing – Methods and Tools in the real world

The originality of this publication is to look at the subject of IDP (In...
research
06/13/2023

Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts

The increasing availability of digital collections of historical and con...
research
02/20/2020

Processing topical queries on images of historical newspaper pages

Historical newspapers are a source of research for the human and social ...
research
11/21/2022

A plea for an upgrade to the digital craft of the historian and digital methodology for discovering the past

This essay aims to bid analogue historians assume that digitisation is t...

Please sign up or login with your details

Forgot password? Click here to reset