Transcribing Medieval Manuscripts for Machine Learning

07/15/2022
by   Estelle Guéville, et al.
0

In the early twentieth century, many scholars focused on the preparation of editions and translations of texts previously available only to the few specialists able to read archaic hands and privileged enough to travel to work in person with them in manuscript. Valuable scholarship in its own right, the preparation of these editions and translations for particular texts deemed important enough to justify the effort and time, laid the foundation for generations of scholarship in medieval studies. On the other hand, for many materials in historical archival collections, including already digitised collections, medievalists have only had the time to create partial transcriptions, if any at all. Access to textual material from the medieval period has increased greatly in recent years with digitisation, and we are able to imagine many new research projects in decades to come. What challenges do new frontiers of automation in the archives raise with respect to medieval studies and in particular to the ways we transcribe? In this article, we argue that if medievalists hope to pursue the kinds of analysis that goes on in advanced computational research, we will need new kinds of transcriptions, intentionally theorized not only for human reading, but also for machine processing. We already have mature methods for remediating generations of editions of medieval works such as Optical Character Recognition (OCR), but we can ask ourselves if these are the kinds of text we want to use for future computational analysis. We suggest instead that one way forward is by going back to the scriptorium.

READ FULL TEXT

page 2

page 14

page 21

research
03/31/2020

Assessing Human Translations from French to Bambara for Machine Learning: a Pilot Study

We present novel methods for assessing the quality of human-translated a...
research
01/05/2015

Chasing the Ghosts of Ibsen: A computational stylistic analysis of drama in translation

Research into the stylistic properties of translations is an issue which...
research
04/10/2017

Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years

Recent research shows that most Brazilian students have serious problems...
research
06/13/2023

Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts

The increasing availability of digital collections of historical and con...
research
03/04/2022

OCR quality affects perceived usefulness of historical newspaper clippings – a user study

Effects of Optical Character Recognition (OCR) quality on historical inf...
research
12/23/2016

Understanding Non-optical Remote-sensed Images: Needs, Challenges and Ways Forward

Non-optical remote-sensed images are going to be used more often in man-...

Please sign up or login with your details

Forgot password? Click here to reset