On Language Clustering: A Non-parametric Statistical Approach

09/14/2022
by   Anagh Chattopadhyay, et al.
11

Any approach aimed at pasteurizing and quantifying a particular phenomenon must include the use of robust statistical methodologies for data analysis. With this in mind, the purpose of this study is to present statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks, as well as to examine their application in the field of natural language processing and language clustering. Furthermore, this paper discusses the many uses of nonparametric approaches in linguistic data mining and processing. The data depth idea allows for the centre-outward ordering of points in any dimension, resulting in a new nonparametric multivariate statistical analysis that does not require any distributional assumptions. The concept of hierarchy is used in historical language categorisation and structuring, and it aims to organise and cluster languages into subfamilies using the same premise. In this regard, the current study presents a novel approach to language family structuring based on non-parametric approaches produced from a typological structure of words in various languages, which is then converted into a Cartesian framework using MDS. This statistical-depth-based architecture allows for the use of data-depth-based methodologies for robust outlier detection, which is extremely useful in understanding the categorization of diverse borderline languages and allows for the re-evaluation of existing classification systems. Other depth-based approaches are also applied to processes such as unsupervised and supervised clustering. This paper therefore provides an overview of procedures that can be applied to nonhomogeneous language classification systems in a nonparametric framework.

READ FULL TEXT

page 5

page 10

page 11

page 14

research
09/26/2017

Adaptive Nonparametric Clustering

This paper presents a new approach to non-parametric cluster analysis ca...
research
06/21/2022

Depth-based clustering analysis of directional data

A new depth-based clustering procedure for directional data is proposed....
research
05/16/2022

Reasoning about Procedures with Natural Language Processing: A Tutorial

This tutorial provides a comprehensive and in-depth view of the research...
research
05/03/2023

evaluating bert and parsbert for analyzing persian advertisement data

This paper discusses the impact of the Internet on modern trading and th...
research
03/19/2018

Nonparametric forecasting of multivariate probability density functions

The study of dependence between random variables is the core of theoreti...
research
08/12/2020

Reparametrization Invariance in non-parametric Causal Discovery

Causal discovery estimates the underlying physical process that generate...
research
07/24/2020

New clustering approach for symbolic polygonal data: application to the clustering of entrepreneurial regimes

Entrepreneurial regimes are topic, receiving ever more research attentio...

Please sign up or login with your details

Forgot password? Click here to reset