Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

04/28/2017
by   Ehsaneddin Asgari, et al.
0

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the best of our knowledge - the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

READ FULL TEXT

page 8

page 10

page 11

research
04/12/2021

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

Major advancement in the performance of machine translation models has b...
research
02/09/2023

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

We present a large, multilingual study into how vision constrains lingui...
research
07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...
research
06/18/2019

Uncovering Probabilistic Implications in Typological Knowledge Bases

The study of linguistic typology is rooted in the implications we find b...
research
01/28/2022

Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages

Parallel corpora are ideal for extracting a multilingual named entity (M...
research
06/09/2022

Corpus Similarity Measures Remain Robust Across Diverse Languages

This paper experiments with frequency-based corpus similarity measures a...
research
04/21/2019

UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages

In this paper, we introduce UniSent a universal sentiment lexica for 100...

Please sign up or login with your details

Forgot password? Click here to reset