Variation of word frequencies in Russian literary texts

03/01/2015
by   Vladislav Kargin, et al.
0

We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent 0.62, showing that the rarer words have a relatively larger degree of frequency volatility (i.e., "burstiness"). Several latent factors models have been estimated to investigate the structure of the word frequency distribution. The dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset