Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions

08/30/2020
by   P. S. Dodds, et al.
0

Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points in time – a facet of allotaxonometry – a great range of probability divergences are available. Here, we introduce and explore `probability-turbulence divergence', a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions. We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency. We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for `exclusive types' which are types that appear in only one system. We explore comparisons of example distributions taken from literature, social media, and ecology. We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, L^(p) norms, the Sørensen-Dice coefficient (the F_1 statistic), and the Hellinger distance. We discuss similarities with the generalized entropies of Rényi and Tsallis, and the diversity indices (or Hill numbers) from ecology. We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.

READ FULL TEXT

page 5

page 7

page 8

page 9

research
07/09/2023

Bayesian estimation of the Kullback-Leibler divergence for categorical sytems using mixtures of Dirichlet priors

In many applications in biology, engineering and economics, identifying ...
research
11/01/2020

Distances between probability distributions of different dimensions

Comparing probability distributions is an indispensable and ubiquitous t...
research
04/14/2015

A data-based classification of Slavic languages: Indices of qualitative variation applied to grapheme frequencies

The Ord's graph is a simple graphical method for displaying frequency di...
research
10/01/2015

Similarity of symbol frequency distributions with heavy tails

Quantifying the similarity between symbolic sequences is a traditional p...
research
11/11/2020

(f,Γ)-Divergences: Interpolating between f-Divergences and Integral Probability Metrics

We develop a general framework for constructing new information-theoreti...
research
02/02/2013

Sharp Inequalities for f-divergences

f-divergences are a general class of divergences between probability mea...

Please sign up or login with your details

Forgot password? Click here to reset