Empirical Analysis of Zipf's Law, Power Law, and Lognormal Distributions in Medical Discharge Reports

03/30/2020
by   Juan C. Quiroz, et al.
0

Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power law distributions to the data, and testing whether alternative distributions–lognormal, exponential, stretched exponential, and truncated power law–provided superior fits to the data. Results show that discharge reports are best fit by the truncated power law and lognormal distributions. Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power law and lognormal probability priors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2018

Power-law distributions in geoscience revisited

The size or energy of diverse structures or phenomena in geoscience appe...
research
10/04/2017

Estimating the number of casualties in the American Indian war: a Bayesian analysis using the power law distribution

The American Indian war lasted over one hundred years, and is a major ev...
research
12/31/2018

Types, Tokens, and Hapaxes: A New Heap's Law

Heap's Law states that in a large enough text corpus, the number of type...
research
11/30/2017

Benford's law first significant digit and distribution distances for testing the reliability of financial reports in developing countries

We discuss a common suspicion about reported financial data, in 10 indus...
research
06/09/2021

Verification and Validation of Log-Periodic Power Law Models

We propose and implement a nonlinear Verification and Validation (V V)...
research
03/03/2022

New power-law tailed distributions emerging in κ-statistics

Over the last two decades, it has been argued that the Lorentz transform...
research
12/29/2016

Verifying Heaps' law using Google Books Ngram data

This article is devoted to the verification of the empirical Heaps law i...

Please sign up or login with your details

Forgot password? Click here to reset