Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

09/09/2021
by   Koren Lazar, et al.
0

We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89 pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.

READ FULL TEXT
research
03/04/2020

Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks

The main source of information regarding ancient Mesopotamian history an...
research
11/30/2022

BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?

Pretrained transformer models have achieved state-of-the-art results in ...
research
04/16/2023

Sabiá: Portuguese Large Language Models

As the capabilities of language models continue to advance, it is concei...
research
05/02/2022

POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection

Ideology is at the core of political science research. Yet, there still ...
research
12/31/2020

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Advances in English language representation enabled a more sample-effici...
research
10/12/2021

Time Masking for Temporal Language Models

Our world is constantly evolving, and so is the content on the web. Cons...
research
11/10/2012

Dating Texts without Explicit Temporal Cues

This paper tackles temporal resolution of documents, such as determining...

Please sign up or login with your details

Forgot password? Click here to reset