In narrative texts punctuation marks obey the same statistics as words

04/04/2016
by   Andrzej Kulig, et al.
0

From a grammar point of view, the role of punctuation marks in a sentence is formally defined and well understood. In semantic analysis punctuation plays also a crucial role as a method of avoiding ambiguity of the meaning. A different situation can be observed in the statistical analyses of language samples, where the decision on whether the punctuation marks should be considered or should be neglected is seen rather as arbitrary and at present it belongs to a researcher's preference. An objective of this work is to shed some light onto this problem by providing us with an answer to the question whether the punctuation marks may be treated as ordinary words and whether they should be included in any analysis of the word co-occurences. We already know from our previous study (S. Drożdż et al., Inf. Sci. 331 (2016) 32-44) that full stops that determine the length of sentences are the main carrier of long-range correlations. Now we extend that study and analyze statistical properties of the most common punctuation marks in a few Indo-European languages, investigate their frequencies, and locate them accordingly in the Zipf rank-frequency plots as well as study their role in the word-adjacency networks. We show that, from a statistical viewpoint, the punctuation marks reveal properties that are qualitatively similar to the properties of the most frequent words like articles, conjunctions, pronouns, and prepositions. This refers to both the Zipfian analysis and the network analysis. By adding the punctuation marks to the Zipf plots, we also show that these plots that are normally described by the Zipf-Mandelbrot distribution largely restore the power-law Zipfian behaviour for the most frequent items.

READ FULL TEXT
research
11/18/2016

Statistical Properties of European Languages and Voynich Manuscript Analysis

The statistical properties of letters frequencies in European literature...
research
12/30/2017

The origins of Zipf's meaning-frequency law

In his pioneering research, G. K. Zipf observed that more frequent words...
research
12/29/2014

Quantifying origin and character of long-range correlations in narrative texts

In natural language using short sentences is considered efficient for co...
research
01/17/2023

Statistical analysis of word flow among five Indo-European languages

A recent increase in data availability has allowed the possibility to pe...
research
05/30/2018

Character-Level Models versus Morphology in Semantic Role Labeling

Character-level models have become a popular approach specially for thei...
research
12/28/2018

The role of grammar in transition-probabilities of subsequent words in English text

Sentence formation is a highly structured, history-dependent, and sample...
research
11/11/2016

Generalized Entropies and the Similarity of Texts

We show how generalized Gibbs-Shannon entropies can provide new insights...

Please sign up or login with your details

Forgot password? Click here to reset