Language discrimination and clustering via a neural network approach

07/15/2015
by   Angelo Mariano, et al.
0

We classify twenty-one Indo-European languages starting from written text. We use neural networks in order to define a distance among different languages, construct a dendrogram and analyze the ultrametric structure that emerges. Four or five subgroups of languages are identified, according to the "cut" of the dendrogram, drawn with an entropic criterion. The results and the method are discussed.

READ FULL TEXT
research
06/13/2019

A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network

Natural languages are complexly structured entities. They exhibit charac...
research
11/18/2019

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...
research
12/11/2020

Discriminating Between Similar Nordic Languages

Automatic language identification is a challenging problem. Discriminati...
research
11/18/2016

Statistical Properties of European Languages and Voynich Manuscript Analysis

The statistical properties of letters frequencies in European literature...
research
12/02/2020

Linguistic Classification using Instance-Based Learning

Traditionally linguists have organized languages of the world as languag...
research
11/18/2019

Universal and non-universal text statistics: Clustering coefficient for language identification

In this work we analyze statistical properties of 91 relatively small te...
research
10/24/2019

Capacity, Bandwidth, and Compositionality in Emergent Language Learning

Many recent works have discussed the propensity, or lack thereof, for em...

Please sign up or login with your details

Forgot password? Click here to reset