DeepAI
Log In Sign Up

Markov Chain Monte Carlo for generating ranked textual data

10/13/2022
by   Roy Cerqueti, et al.
0

This paper faces a central theme in applied statistics and information science, which is the assessment of the stochastic structure of rank-size laws in text analysis. We consider the words in a corpus by ranking them on the basis of their frequencies in descending order. The starting point is that the ranked data generated in linguistic contexts can be viewed as the realisations of a discrete states Markov chain, whose stationary distribution behaves according to a discretisation of the best fitted rank-size law. The employed methodological toolkit is Markov Chain Monte Carlo, specifically referring to the Metropolis-Hastings algorithm. The theoretical framework is applied to the rank-size analysis of the hapax legomena occurring in the speeches of the US Presidents. We offer a large number of statistical tests leading to the consistency of our methodological proposal. To pursue our scopes, we also offer arguments supporting that hapaxes are rare (“extreme") events resulting from memory-less-like processes. Moreover, we show that the considered sample has the stochastic structure of a Markov chain of order one. Importantly, we discuss the versatility of the method, which is considered suitable for deducing similar outcomes for other applied science contexts.

READ FULL TEXT

page 6

page 12

page 13

page 15

page 16

page 17

page 22

10/07/2021

Curved Markov Chain Monte Carlo for Network Learning

We present a geometrically enhanced Markov chain Monte Carlo sampler for...
08/05/2021

On rank statistics of PageRank and MarkovRank

An important statistic in analyzing some (finite) network data, called P...
10/02/2017

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

This paper introduces the R package sgmcmc; which can be used for Bayesi...
05/09/2019

A joint text mining-rank size investigation of the rhetoric structures of the US Presidents' speeches

This work presents a text mining context and its use for a deep analysis...
12/04/2018

Bridging trees for posterior inference on Ancestral Recombination Graphs

We present a new Markov chain Monte Carlo algorithm, implemented in soft...
02/13/2013

Bayesian Learning of Loglinear Models for Neural Connectivity

This paper presents a Bayesian approach to learning the connectivity str...
10/11/2018

Efficient estimation of autocorrelation spectra

The performance of Markov chain Monte Carlo calculations is determined b...