Handwriting Classification for the Analysis of Art-Historical Documents

11/04/2020
by   Christian Bartz, et al.
0

Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive. In this paper, we focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI. Since the archive consists of documents written in several languages and lacks annotated training data for the creation of recognition models, we propose the task of handwriting classification as a new step for a handwriting OCR pipeline. We propose a handwriting classification model that labels extracted text fragments, eg, numbers, dates, or words, based on their visual structure. Such a classification supports historians by highlighting documents that contain a specific class of text without the need to read the entire content. To this end, we develop and compare several deep learning-based models for text classification. In extensive experiments, we show the advantages and disadvantages of our proposed approach and discuss possible usage scenarios on a real-world dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2017

DOC: Deep Open Classification of Text Documents

Traditional supervised learning makes the closed-world assumption that t...
research
09/04/2023

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

In this paper, we present a pipeline for image extraction from historica...
research
02/28/2023

Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners

Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-s...
research
03/23/2020

Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

The reconstruction of shredded documents consists in arranging the piece...
research
08/30/2023

Large-scale data extraction from the UNOS organ donor documents

The scope of our study is all UNOS data of the USA organ donors since 20...
research
07/01/2020

Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents

The reconstruction of shredded documents consists of coherently arrangin...
research
02/14/2014

Authorship Analysis based on Data Compression

This paper proposes to perform authorship analysis using the Fast Compre...

Please sign up or login with your details

Forgot password? Click here to reset