Linguistic Classification using Instance-Based Learning

12/02/2020
by   Priya S. Nayak, et al.
0

Traditionally linguists have organized languages of the world as language families modelled as trees. In this work we take a contrarian approach and question the tree-based model that is rather restrictive. For example, the affinity that Sanskrit independently has with languages across Indo-European languages is better illustrated using a network model. We can say the same about inter-relationship between languages in India, where the inter-relationships are better discovered than assumed. To enable such a discovery, in this paper we have made use of instance-based learning techniques to assign language labels to words. We vocalize each word and then classify it by making use of our custom linguistic distance metric of the word relative to training sets containing language labels. We construct the training sets by making use of word clusters and assigning a language and category label to that cluster. Further, we make use of clustering coefficients as a quality metric for our research. We believe our work has the potential to usher in a new era in linguistics. We have limited this work for important languages in India. This work can be further strengthened by applying Adaboost for classification coupled with structural equivalence concepts of social network analysis.

READ FULL TEXT
research
01/29/2023

Linguistic Analysis using Paninian System of Sounds and Finite State Machines

The study of spoken languages comprises phonology, morphology, and gramm...
research
03/25/2016

Classifying Syntactic Regularities for Hundreds of Languages

This paper presents a comparison of classification methods for linguisti...
research
01/31/2020

An efficient automated data analytics approach to large scale computational comparative linguistics

This research project aimed to overcome the challenge of analysing human...
research
07/15/2015

Language discrimination and clustering via a neural network approach

We classify twenty-one Indo-European languages starting from written tex...
research
05/15/2023

A Crosslingual Investigation of Conceptualization in 1335 Languages

Languages differ in how they divide up the world into concepts and words...
research
10/27/2021

Can Linguistic Distance help Language Classification? Assessing Hawrami-Zaza and Kurmanji-Sorani

To consider Hawrami and Zaza (Zazaki) standalone languages or dialects o...
research
03/20/2015

On measuring linguistic intelligence

This work addresses the problem of measuring how many languages a person...

Please sign up or login with your details

Forgot password? Click here to reset