From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

02/23/2018
by   Johannes Bjerva, et al.
0

A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually is prohibitively time-consuming, which is in part evidenced by the fact that only 100 out of over 7,000 languages spoken in the world are fully covered in WALS. We learn distributed language representations, which can be used to predict typological properties on a massively multilingual scale. Additionally, quantitative and qualitative analyses of these language embeddings can tell us how language similarities are encoded in NLP models for tasks at different typological levels. The representations are learned in an unsupervised manner alongside tasks at three typological levels: phonology (grapheme-to-phoneme prediction, and phoneme reconstruction), morphology (morphological inflection), and syntax (part-of-speech tagging). We consider more than 800 languages and find significant differences in the language representations encoded, depending on the target task. For instance, although Norwegian Bokmål and Danish are typologically close to one another, they are phonologically distant, which is reflected in their language embeddings growing relatively distant in a phonological task. We are also able to predict typological features in WALS with high accuracies, even for unseen language families.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Although linguistic typology has a long history, computational approache...
research
10/21/2022

Spectral Probing

Linguistic information is encoded at varying timescales (subwords, phras...
research
01/29/2023

Linguistic Analysis using Paninian System of Sounds and Finite State Machines

The study of spoken languages comprises phonology, morphology, and gramm...
research
01/19/2023

Language Embeddings Sometimes Contain Typological Generalizations

To what extent can neural network models learn generalizations about lan...
research
04/30/2020

Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures

The use of linguistic typological resources in natural language processi...
research
05/12/2021

Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

There is an increasing amount of evidence that in cases with little or n...
research
11/05/2016

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Transfer and multi-task learning have traditionally focused on either a ...

Please sign up or login with your details

Forgot password? Click here to reset