Estimation of the Frequency of Occurrence of Italian Phonemes in Text

01/14/2021
by   Javi Arango, et al.
0

The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes - plus consonant geminated counterparts - of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6,000 phonemes (approx. 1,250 words), varying by <0.025 single source and as an average across sources.

READ FULL TEXT

page 20

page 22

page 24

page 25

page 27

research
07/30/2020

Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence

Our daily life is surrounded by textual information. Nowadays, the autom...
research
05/01/2017

Labelled network subgraphs reveal stylistic subtleties in written texts

The vast amount of data and increase of computational capacity have allo...
research
03/01/2015

Variation of word frequencies in Russian literary texts

We study the variation of word frequencies in Russian literary texts. Ou...
research
05/26/2020

The 'Letter' Distribution in the Chinese Language

Corpus-based statistical analysis plays a significant role in linguistic...
research
09/01/2021

Latin writing styles analysis with Machine Learning: New approach to old questions

In the Middle Ages texts were learned by heart and spread using oral mea...
research
03/23/2020

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...
research
05/18/2020

Reconstructing Maps from Text

Previous research has demonstrated that Distributional Semantic Models (...

Please sign up or login with your details

Forgot password? Click here to reset