Restoring ancient text using deep learning: a case study on Greek epigraphy

10/14/2019
by   Yannis Assael, et al.
0

Ancient history relies on disciplines such as epigraphy, the study of ancient inscribed texts, for evidence of the recorded past. However, these texts, "inscriptions", are often damaged over the centuries, and illegible parts of the text must be restored by specialists, known as epigraphists. This work presents Pythia, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Its architecture is carefully designed to handle long-term context information, and deal efficiently with missing or corrupted character and word representations. To train it, we wrote a non-trivial pipeline to convert PHI, the largest digital corpus of ancient Greek inscriptions, to machine actionable text, which we call PHI-ML. On PHI-ML, Pythia's predictions achieve a 30.1 rate, compared to the 57.3 the ground-truth sequence was among the Top-20 hypotheses of Pythia, which effectively demonstrates the impact of this assistive method on the field of digital epigraphy, and sets the state-of-the-art in ancient text restoration.

READ FULL TEXT
research
03/04/2020

Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks

The main source of information regarding ancient Mesopotamian history an...
research
03/15/2021

Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for Two-Dimensional Sentence

We study text representation methods using deep models. Current methods,...
research
09/06/2020

Romanian Diacritics Restoration Using Recurrent Neural Networks

Diacritics restoration is a mandatory step for adequately processing Rom...
research
06/07/2020

A Multitask Learning Approach for Diacritic Restoration

In many languages like Arabic, diacritics are used to specify pronunciat...
research
05/30/2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

This paper introduces a new speech dataset called “LibriTTS-R” designed ...
research
11/26/2021

Natural Scene Text Editing Based on AI

In a recorded situation, textual information is crucial for scene interp...
research
09/22/2017

Sentence Correction Based on Large-scale Language Modelling

With the further development of informatization, more and more data is s...

Please sign up or login with your details

Forgot password? Click here to reset