Syntactic Phylogenetic Trees

07/10/2016
by   Kevin Shu, et al.
0

In this paper we identify several serious problems that arise in the use of syntactic data from the SSWL database for the purpose of computational phylogenetic reconstruction. We show that the most naive approach fails to produce reliable linguistic phylogenetic trees. We identify some of the sources of the observed problems and we discuss how they may be, at least partly, corrected by using additional information, such as prior subdivision into language families and subfamilies, and a better use of the information about ancient languages. We also describe how the use of phylogenetic algebraic geometry can help in estimating to what extent the probability distribution at the leaves of the phylogenetic tree obtained from the SSWL data can be considered reliable, by testing it on phylogenetic trees established by other forms of linguistic analysis. In simple examples, we find that, after restricting to smaller language subfamilies and considering only those SSWL parameters that are fully mapped for the whole subfamily, the SSWL data match extremely well reliable phylogenetic trees, according to the evaluation of phylogenetic invariants. This is a promising sign for the use of SSWL data for linguistic phylogenetics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2017

Phylogenetics of Indo-European Language families via an Algebro-Geometric Analysis of their Syntactic Structures

Using Phylogenetic Algebraic Geometry, we analyze computationally the ph...
research
03/12/2019

Topological Analysis of Syntactic Structures

We use the persistent homology method of topological data analysis and d...
research
03/02/2017

Structural Embedding of Syntactic Trees for Machine Comprehension

Deep neural networks for machine comprehension typically utilizes only w...
research
05/12/2020

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

It is commonly believed that knowledge of syntactic structure should imp...
research
03/25/2016

Classifying Syntactic Regularities for Hundreds of Languages

This paper presents a comparison of classification methods for linguisti...
research
06/01/2023

Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Recent advancements in pre-trained language models (PLMs) have demonstra...
research
06/09/2023

Progress on Constructing Phylogenetic Networks for Languages

In 2006, Warnow, Evans, Ringe, and Nakhleh proposed a stochastic model (...

Please sign up or login with your details

Forgot password? Click here to reset