Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance

by   tafseer-ahmed, et al.

Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as genetic relationship, mutual intelligibility, and the usage of one's vocabulary into other. There are various methods through which LS is evaluated. In the same regard, this paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with them. The system converts the words into the corresponding International Phonetic Alphabet (IPA), followed by the conversion of IPA into its set of articulatory features. Later, the lists of the set of articulatory features are compared using the proposed method. As an example, PED gives edit distance of German word vater and Persian word pidar as 0.82; and similarly, Hebrew word shalom and Arabic word salaam as 0.93, whereas for a juxtapose comparison, their IPA based edit distances are 4 and 2 respectively. Experiments are performed with six languages (Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu). In this regard, we extracted part of speech wise word-lists from the Universal Dependency corpora and evaluated the LS for every pair of language. Thus, with the proposed approach, we find the genetic affinity, similarity, and borrowing/loan-words despite having script differences and sound variation phenomena among these languages.


Towards Normalizing the Edit Distance Using a Genetic Algorithms Based Scheme

The normalized edit distance is one of the distances derived from the ed...

An efficient automated data analytics approach to large scale computational comparative linguistics

This research project aimed to overcome the challenge of analysing human...

Testing Membership for Timed Automata

Given a timed automata which admits thick components and a timed word x,...

Soft edit distance for differentiable comparison of symbolic sequences

Edit distance, also known as Levenshtein distance, is an essential way t...

Global-scale phylogenetic linguistic inference from lexical resources

Automatic phylogenetic inference plays an increasingly important role in...

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

There have been multiple attempts to resolve various inflection matching...

Please sign up or login with your details

Forgot password? Click here to reset