Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures

04/30/2020
by   Alexander Gutkin, et al.
0

The use of linguistic typological resources in natural language processing has been steadily gaining more popularity. It has been observed that the use of typological information, often combined with distributed language representations, leads to significantly more powerful models. While linguistic typology representations from various resources have mostly been used for conditioning the models, there has been relatively little attention on predicting features from these resources from the input data. In this paper we investigate whether the various linguistic features from World Atlas of Language Structures (WALS) can be reliably inferred from multi-lingual text. Such a predictor can be used to infer structural features for a language never observed in training data. We frame this task as a multi-label classification involving predicting the set of non-mutually exclusive and extremely sparse multi-valued labels (WALS features). We construct a recurrent neural network predictor based on byte embeddings and convolutional layers and test its performance on 556 languages, providing analysis for various linguistic types, macro-areas, language families and individual features. We show that some features from various linguistic types can be predicted reliably.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Although linguistic typology has a long history, computational approache...
research
03/25/2016

Classifying Syntactic Regularities for Hundreds of Languages

This paper presents a comparison of classification methods for linguisti...
research
10/21/2022

What do Large Language Models Learn beyond Language?

Large language models (LMs) have rapidly become a mainstay in Natural La...
research
02/23/2018

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

A core part of linguistic typology is the classification of languages ac...
research
01/19/2023

Language Embeddings Sometimes Contain Typological Generalizations

To what extent can neural network models learn generalizations about lan...
research
01/29/2018

Geospatial distributions reflect rates of evolution of features of language

Different structural features of human language change at different rate...

Please sign up or login with your details

Forgot password? Click here to reset