A data-based classification of Slavic languages: Indices of qualitative variation applied to grapheme frequencies

04/14/2015
by   Michaela Koscová, et al.
0

The Ord's graph is a simple graphical method for displaying frequency distributions of data or theoretical distributions in the two-dimensional plane. Its coordinates are proportions of the first three moments, either empirical or theoretical ones. A modification of the Ord's graph based on proportions of indices of qualitative variation is presented. Such a modification makes the graph applicable also to data of categorical character. In addition, the indices are normalized with values between 0 and 1, which enables comparing data files divided into different numbers of categories. Both the original and the new graph are used to display grapheme frequencies in eleven Slavic languages. As the original Ord's graph requires an assignment of numbers to the categories, graphemes were ordered decreasingly according to their frequencies. Data were taken from parallel corpora, i.e., we work with grapheme frequencies from a Russian novel and its translations to ten other Slavic languages. Then, cluster analysis is applied to the graph coordinates. While the original graph yields results which are not linguistically interpretable, the modification reveals meaningful relations among the languages.

READ FULL TEXT
research
09/03/2017

Top-Frequency Parallel Coordinates Plots

Parallel coordinates plotting is one of the most popular methods for mul...
research
08/30/2020

Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions

Real-world complex systems often comprise many distinct types of element...
research
03/12/2020

Modification Indices for Diagnostic Classification Models

Diagnostic classification models (DCMs) are psychometric models for eval...
research
03/09/2022

Pruning Graph Convolutional Networks to select meaningful graph frequencies for fMRI decoding

Graph Signal Processing is a promising framework to manipulate brain sig...
research
10/15/2020

Learning Languages with Decidable Hypotheses

In language learning in the limit, the most common type of hypothesis is...
research
04/16/2023

Development of Tools for the Classification of Peer Groups Geographies in the Analysis of Health Care Variation

This dissertation is based on a project co-founded by the Health Market ...
research
12/10/2019

Representational Rényi heterogeneity

A discrete system's heterogeneity is measured by the Rényi heterogeneity...

Please sign up or login with your details

Forgot password? Click here to reset