Global-scale phylogenetic linguistic inference from lexical resources

02/17/2018
by   Gerhard Jäger, et al.
0

Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two third of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training of an SVM classifier on expert cognacy judgments. Third, we defined two types of binary characters, based on automatically inferred cognate classes and on sound-class occurrences. Several tests are reported demonstrating the suitability of these characters for character-based phylogenetic inference.

READ FULL TEXT

page 5

page 10

research
02/16/2017

Fast and unsupervised methods for multilingual cognate clustering

In this paper we explore the use of unsupervised methods for detecting c...
research
04/11/2022

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

This paper describes a method to enrich lexical resources with content r...
research
03/27/2023

Inference Rules for Binary Predicates in a Multigranular Framework

In a multigranular framework, the two most important binary predicates a...
research
08/09/2023

Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

We present a cross-linguistic study that aims to quantify vowel harmony ...
research
08/16/2020

Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance

Lexical Similarity (LS) between two languages uncovers many interesting ...
research
04/10/2022

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

Computational approaches in historical linguistics have been increasingl...
research
01/13/2023

From stage to page: language independent bootstrap measures of distinctiveness in fictional speech

Stylometry is mostly applied to authorial style. Recently, researchers h...

Please sign up or login with your details

Forgot password? Click here to reset