Estimating the Entropy of Linguistic Distributions

04/04/2022
by   Aryaman Arora, et al.
0

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying probability distribution that gives rise to these data. While entropy estimation is a well-studied problem in other fields, there is not yet a comprehensive exploration of the efficacy of entropy estimators for use with linguistic data. In this work, we fill this void, studying the empirical effectiveness of different entropy estimators for linguistic distributions. In a replication of two recent information-theoretic linguistic studies, we find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators. Finally, we end our paper with concrete recommendations for entropy estimation depending on distribution type and data availability.

READ FULL TEXT

page 10

page 11

research
07/03/2022

Low probability states, data statistics, and entropy estimation

A fundamental problem in analysis of complex systems is getting a reliab...
research
01/31/2023

Bayesian estimation of information-theoretic metrics for sparsely sampled distributions

Estimating the Shannon entropy of a discrete distribution from which we ...
research
10/06/2022

The Shannon Entropy of a Histogram

The histogram is a key method for visualizing data and estimating the un...
research
07/27/2022

Informational properties of the family of cubic rank transmuted distributions

Recently, cubic rank transmuted (CRT) distribution was introduced and st...
research
11/14/2019

Estimating differential entropy using recursive copula splitting

A method for estimating the Shannon differential entropy of multidimensi...
research
01/28/2019

Shannon's entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018

Starting from the pioneering works of Shannon and Weiner in 1948, a plet...
research
07/06/2018

Outperforming Good-Turing: Preliminary Report

Estimating a large alphabet probability distribution from a limited numb...

Please sign up or login with your details

Forgot password? Click here to reset