Corrections of Zipf's and Heaps' Laws Derived from Hapax Rate Models

07/24/2023
by   Łukasz Dębowski, et al.
0

The article introduces corrections to Zipf's and Heaps' laws based on systematic models of the hapax rate. The derivation rests on two assumptions: The first one is the standard urn model which predicts that marginal frequency distributions for shorter texts look as if word tokens were sampled blindly from a given longer text. The second assumption posits that the rate of hapaxes is a simple function of the text size. Four such functions are discussed: the constant model, the Davis model, the linear model, and the logistic model. It is shown that the logistic model yields the best fit.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset