Corrections of Zipf's and Heaps' Laws Derived from Hapax Rate Models

07/24/2023
by   Łukasz Dębowski, et al.
0

The article introduces corrections to Zipf's and Heaps' laws based on systematic models of the hapax rate. The derivation rests on two assumptions: The first one is the standard urn model which predicts that marginal frequency distributions for shorter texts look as if word tokens were sampled blindly from a given longer text. The second assumption posits that the rate of hapaxes is a simple function of the text size. Four such functions are discussed: the constant model, the Davis model, the linear model, and the logistic model. It is shown that the logistic model yields the best fit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

Zipf's laws of meaning in Catalan

In his pioneering research, G. K. Zipf formulated a couple of statistica...
research
12/11/2020

Bayesian Variable Selection for Single Index Logistic Model

In the era of big data, variable selection is a key technology for handl...
research
11/18/2019

Universal and non-universal text statistics: Clustering coefficient for language identification

In this work we analyze statistical properties of 91 relatively small te...
research
08/28/2020

A Note on Debiased/Double Machine Learning Logistic Partially Linear Model

It is of particular interests in many application fields to draw doubly ...
research
03/18/2019

On Generalized q-logistic Distribution and its Characterizations

Several generalizations of the logistic distribution, and certain relate...
research
04/23/2019

On laws exhibiting universal ordering under stochastic restart

For each of (i) arbitrary stochastic reset, (ii) deterministic reset wit...
research
07/23/2019

On URANS Congruity with Time Averaging: Analytical laws suggest improved models

The standard 1-equation model of turbulence was first derived by Prandt...

Please sign up or login with your details

Forgot password? Click here to reset