Quantifying origin and character of long-range correlations in narrative texts

12/29/2014
by   Stanislaw Drozdz, et al.
0

In natural language using short sentences is considered efficient for communication. However, a text composed exclusively of such sentences looks technical and reads boring. A text composed of long ones, on the other hand, demands significantly more effort for comprehension. Studying characteristics of the sentence length variability (SLV) in a large corpus of world-famous literary texts shows that an appealing and aesthetic optimum appears somewhere in between and involves selfsimilar, cascade-like alternation of various lengths sentences. A related quantitative observation is that the power spectra S(f) of thus characterized SLV universally develop a convincing `1/f^beta' scaling with the average exponent beta = 1/2, close to what has been identified before in musical compositions or in the brain waves. An overwhelming majority of the studied texts simply obeys such fractal attributes but especially spectacular in this respect are hypertext-like, "stream of consciousness" novels. In addition, they appear to develop structures characteristic of irreducibly interwoven sets of fractals called multifractals. Scaling of S(f) in the present context implies existence of the long-range correlations in texts and appearance of multifractality indicates that they carry even a nonlinear component. A distinct role of the full stops in inducing the long-range correlations in texts is evidenced by the fact that the above quantitative characteristics on the long-range correlations manifest themselves in variation of the full stops recurrence times along texts, thus in SLV, but to a much lesser degree in the recurrence times of the most frequent words. In this latter case the nonlinear correlations, thus multifractality, disappear even completely for all the texts considered. Treated as one extra word, the full stops at the same time appear to obey the Zipfian rank-frequency distribution, however.

READ FULL TEXT

page 9

page 12

research
04/10/2018

Natural Language Statistical Features of LSTM-generated Texts

Long Short-Term Memory (LSTM) networks have recently shown remarkable pe...
research
04/04/2016

In narrative texts punctuation marks obey the same statistics as words

From a grammar point of view, the role of punctuation marks in a sentenc...
research
08/25/2020

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts

This study investigates global properties of literary and non-literary t...
research
04/01/2022

Fractal and multifractal descriptors restore ergodicity broken by non-Gaussianity in time series

Ergodicity breaking is a challenge for biological and psychological scie...
research
02/28/2023

A Survey on Long Text Modeling with Transformers

Modeling long texts has been an essential technique in the field of natu...
research
12/12/2022

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a que...
research
09/18/2019

Text Length Adaptation in Sentiment Classification

Can a text classifier generalize well for datasets where the text length...

Please sign up or login with your details

Forgot password? Click here to reset