Estimation of Zipf parameter by means of a sequence of counts of different words

11/03/2017
by   Mikhail Chebunin, et al.
0

We study a probabilistic model of text in which probabilities of words decrease in accordance with a discrete power distribution. We construct a class of strongly consistent and asymptotically normal estimators based on a sequence of counts of different words. Then we construct a countsbridge, a process which converges to a centered Gaussian process and serves to test a hypothesis of correspondence between a text and this probabilistic model. The test is of omega-squared type.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2019

A statistical test for correspondence of texts to the Zipf-Mandelbrot law

We analyse correspondence of a text to a simple probabilistic model. The...
research
09/13/2019

Estimating drift parameters in a non-ergodic Gaussian Vasicek-type model

We study the problem of parameter estimation for a non-ergodic Gaussian ...
research
06/13/2021

Inferring the mixing properties of an ergodic process

We propose strongly consistent estimators of the ℓ_1 norm of the sequenc...
research
07/12/2021

A consistent bayesian bootstrap for chi-squared goodness-of-fit test using a Dirichlet prior

In this paper, we employ the Dirichlet process in a hypothesis testing f...
research
03/01/2023

Parameter estimation for a hidden linear birth and death process with immigration

In this paper, we use a linear birth and death process with immigration ...
research
06/14/2017

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

As we discuss, a stationary stochastic process is nonergodic when a rand...
research
10/27/2020

Impossibility of phylogeny reconstruction from k-mer counts

We consider phylogeny estimation under a two-state model of sequence evo...

Please sign up or login with your details

Forgot password? Click here to reset