Heaps' law and Heaps functions in tagged texts: Evidences of their linguistic relevance

01/07/2020
by   Andrés Chacoma, et al.
0

We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or “tags,” namely, nouns, verbs, and others), and analyze the progressive appearance of new words of each tag along each individual text. While the power-law relation prescribed by Heaps' law is satisfactorily fulfilled by total vocabulary sizes and text lengths, the appearance of new words in each text is on the whole well described by the average of random shufflings of the text, which does not obey a power law. Deviations from this average, however, are statistically significant and show a systematic trend across the corpus. Specifically, they reveal that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags are shown to add systematically distinct contributions to this tendency, with verbs and others being respectively more and less retarded than the mean trend, and nouns following instead this overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps' law, a feature that is still in need of extensive assessment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2018

Relating Zipf's law to textual information

Zipf's law is the main regularity of quantitative linguistics. Despite o...
research
02/07/2021

Word frequency-rank relationship in tagged texts

We analyze the frequency-rank relationship in sub-vocabularies correspon...
research
04/09/2021

Heaps' Law and Vocabulary Richness in the History of Classical Music Harmony

Music is a fundamental human construct, and harmony provides the buildin...
research
04/11/2023

Mathematical and Linguistic Characterization of Orhan Pamuk's Nobel Works

In this study, Nobel Laureate Orhan Pamuk's works are chosen as examples...
research
03/11/2019

Scaling in Words on Twitter

Scaling properties of language are a useful tool for understanding gener...
research
08/19/2022

Characterizing narrative time in books through fluctuations in power and danger arcs

While recent studies have focused on quantifying word usage to find the ...
research
11/02/2022

There Are Fewer Facts Than Words: Communication With A Growing Complexity

We present an impossibility result, called a theorem about facts and wor...

Please sign up or login with your details

Forgot password? Click here to reset