Ab Antiquo: Proto-language Reconstruction with RNNs

08/07/2019
by   Carlo Meloni, et al.
0

Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals a variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.

READ FULL TEXT
research
04/10/2022

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

Computational approaches in historical linguistics have been increasingl...
research
05/17/2022

Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings

While a great deal of work has been done on NLP approaches to lexical se...
research
05/27/2020

In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology

This paper investigates the ability of neural network architectures to e...
research
11/16/2022

Neural Unsupervised Reconstruction of Protolanguage Word Forms

We present a state-of-the-art neural approach to the unsupervised recons...
research
10/21/2020

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Most undeciphered lost languages exhibit two characteristics that pose s...
research
06/16/2019

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B

In this paper we propose a novel neural approach for automatic decipherm...
research
11/01/2020

Semantic coordinates analysis reveals language changes in the AI field

Semantic shifts can reflect changes in beliefs across hundreds of years,...

Please sign up or login with your details

Forgot password? Click here to reset