The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts

12/15/2022
by   Lisa Koopmans, et al.
0

Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. However, the sparse availability of such documents poses a challenge in obtaining robust systems. Hence, the research of this article explores the influence of data augmentation on the dating of historical manuscripts. Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts of different collections, including the Medieval Paleographical Scale, early Aramaic manuscripts, and the Dead Sea Scrolls. Results show that training models with augmented data improve the performance of historical manuscripts dating by 1 this indicates further enhancement possibilities by considering models specific to the features and the documents' scripts.

READ FULL TEXT

page 2

page 8

page 9

research
05/30/2023

DuoSearch: A Novel Search Engine for Bulgarian Historical Documents

Search in collections of digitised historical documents is hindered by a...
research
10/30/2022

Recognizing Handwriting Styles in a Historical Scanned Document Using Scikit-Fuzzy c-means Clustering

The forensic attribution of the handwriting in a digitized document to m...
research
08/09/2021

Identifying Wetland Areas in Historical Maps using Deep Convolutional Neural Networks

1) The local environment and land usages have changed a lot during the p...
research
09/30/2016

Modeling Language Change in Historical Corpora: The Case of Portuguese

This paper presents a number of experiments to model changes in a histor...
research
04/08/2022

A Generic Image Retrieval Method for Date Estimation of Historical Document Collections

Date estimation of historical document images is a challenging problem, ...
research
12/04/2020

Boosting offline handwritten text recognition in historical documents with few labeled lines

In this paper, we face the problem of offline handwritten text recogniti...
research
04/21/2021

Possibilities, Challenges and Limits of a European Charters Corpus (Cartae Europae Medii Aevi - CEMA)

The objective of this paper is to present a meta-corpus of diplomatic do...

Please sign up or login with your details

Forgot password? Click here to reset