Fast and unsupervised methods for multilingual cognate clustering

02/16/2017
by   Taraka Rama, et al.
0

In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM based system and two linguistically motivated systems: LexStat and ALINE. Our results suggest that a PMI system trained in an online fashion can be used by historical linguists for fast and accurate identification of cognates in not so well-studied language families.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2016

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing con...
research
02/17/2018

Global-scale phylogenetic linguistic inference from lexical resources

Automatic phylogenetic inference plays an increasingly important role in...
research
07/05/2023

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

Acoustic word embeddings (AWEs) are fixed-dimensional vector representat...
research
03/31/2023

Trimming Phonetic Alignments Improves the Inference of Sound Correspondence Patterns from Multilingual Wordlists

Sound correspondence patterns form the basis of cognate detection and ph...
research
05/17/2016

Siamese convolutional networks based on phonetic features for cognate identification

In this paper, we explore the use of convolutional networks (ConvNets) f...
research
01/03/2023

Average Is Not Enough: Caveats of Multilingual Evaluation

This position paper discusses the problem of multilingual evaluation. Us...
research
05/23/2022

Unsupervised Tokenization Learning

In the presented study, we discover that the so-called "transition freed...

Please sign up or login with your details

Forgot password? Click here to reset