Estimation of Zipf parameter by means of a sequence of counts of different words
We study a probabilistic model of text in which probabilities of words decrease in accordance with a discrete power distribution. We construct a class of strongly consistent and asymptotically normal estimators based on a sequence of counts of different words. Then we construct a countsbridge, a process which converges to a centered Gaussian process and serves to test a hypothesis of correspondence between a text and this probabilistic model. The test is of omega-squared type.
READ FULL TEXT