Revisiting Entropy Rate Constancy in Text

05/20/2023
by   Vivek Verma, et al.
0

The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2021

A Cognitive Regularizer for Language Modeling

The uniform information density (UID) hypothesis, which posits that spea...
research
09/23/2021

Revisiting the Uniform Information Density Hypothesis

The uniform information density (UID) hypothesis posits a preference amo...
research
06/10/2018

Are All Languages Equally Hard to Language-Model?

For general modeling methods applied to diverse languages, a natural que...
research
03/02/2021

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

There is an ongoing debate in the NLP community whether modern language ...
research
09/28/2021

On Homophony and Rényi Entropy

Homophony's widespread presence in natural languages is a controversial ...
research
02/04/2021

One Size Does Not Fit All: Finding the Optimal N-gram Sizes for FastText Models across Languages

Unsupervised word representation learning from large corpora is badly ne...
research
07/14/2023

Othering and low prestige framing of immigrant cuisines in US restaurant reviews and large language models

Identifying and understanding implicit attitudes toward food can help ef...

Please sign up or login with your details

Forgot password? Click here to reset