Estimation of Zipf parameter by means of a sequence of counts of different words

11/03/2017
by   Mikhail Chebunin, et al.
0

We study a probabilistic model of text in which probabilities of words decrease in accordance with a discrete power distribution. We construct a class of strongly consistent and asymptotically normal estimators based on a sequence of counts of different words. Then we construct a countsbridge, a process which converges to a centered Gaussian process and serves to test a hypothesis of correspondence between a text and this probabilistic model. The test is of omega-squared type.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset