A Probabilistic Generative Model of Linguistic Typology

03/26/2019
by   Johannes Bjerva, et al.
0

In the Principles and Parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry---we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, besting several baselines on the task of predicting held-out features. Furthermore, we show that language representations pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e. that there are significant correlations between typological features and languages.

READ FULL TEXT
research
07/08/2018

A Deep Generative Model of Vowel Formant Typology

What makes some types of languages more probable than others? For instan...
research
06/18/2019

Uncovering Probabilistic Implications in Typological Knowledge Bases

The study of linguistic typology is rooted in the implications we find b...
research
01/29/2018

Geospatial distributions reflect rates of evolution of features of language

Different structural features of human language change at different rate...
research
03/25/2016

Classifying Syntactic Regularities for Hundreds of Languages

This paper presents a comparison of classification methods for linguisti...
research
09/11/2018

Feature-Specific Profiling

While high-level languages come with significant readability and maintai...
research
10/12/2020

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

This paper describes the NEMO submission to SIGTYP 2020 shared task whic...
research
05/20/2020

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

We introduce BlaBla, an open-source Python library for extracting lingui...

Please sign up or login with your details

Forgot password? Click here to reset