Rank-frequency distribution of natural languages: a difference of probabilities approach

11/23/2018
by   Germinal Cocho, et al.
0

The time variation of the rank k of words for six Indo-European languages is obtained using data from Google Books. For low ranks the distinct languages behave differently, maybe due to syntaxis rules, whereas for k>50 the law of large numbers predominates. The dynamics of k is described stochastically through a master equation governing the time evolution of its probability density, which is approximated by a Fokker-Planck equation that is solved analytically. The difference between the data and the asymptotic solution is identified with the transient solution, and good agreement is obtained.

READ FULL TEXT
research
12/29/2016

Verifying Heaps' law using Google Books Ngram data

This article is devoted to the verification of the empirical Heaps law i...
research
07/21/2021

A Statistical Model of Word Rank Evolution

The availability of large linguistic data sets enables data-driven appro...
research
05/02/2022

A Two Parameters Equation for Word Rank-Frequency Relation

Let f (·) be the absolute frequency of words and r be the rank of words ...
research
10/28/2021

Numerical solution of the Cauchy problem for Volterra integrodifferential equations with difference kernels

We consider the problems of the numerical solution of the Cauchy problem...
research
03/06/2018

Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages

We demonstrate that large texts, representing human (English, Russian, U...
research
07/09/2020

Vortex Filament Equation for a regular polygon in the hyperbolic plane

The aim of this article is twofold. First, we show the evolution of the ...

Please sign up or login with your details

Forgot password? Click here to reset