Revisiting the Uniform Information Density Hypothesis

09/23/2021
by   Clara Meister, et al.
20

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal. While its implications on language production have been well explored, the hypothesis potentially makes predictions about language comprehension and linguistic acceptability as well. Further, it is unclear how uniformity in a linguistic signal – or lack thereof – should be measured, and over which linguistic unit, e.g., the sentence or language level, this uniformity should hold. Here we investigate these facets of the UID hypothesis using reading time and acceptability data. While our reading time results are generally consistent with previous work, they are also consistent with a weakly super-linear effect of surprisal, which would be compatible with UID's predictions. For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability. We then explore multiple operationalizations of UID, motivated by different interpretations of the original hypothesis, and analyze the scope over which the pressure towards uniformity is exerted. The explanatory power of a subset of the proposed operationalizations suggests that the strongest trend may be a regression towards a mean surprisal across the language, rather than the phrase, sentence, or document – a finding that supports a typical interpretation of UID, namely that it is the byproduct of language users maximizing the use of a (hypothetical) communication channel.

READ FULL TEXT
research
05/20/2023

Revisiting Entropy Rate Constancy in Text

The uniform information density (UID) hypothesis states that humans tend...
research
05/15/2021

A Cognitive Regularizer for Language Modeling

The uniform information density (UID) hypothesis, which posits that spea...
research
03/01/2023

On uniformly consistent tests

Necessary and sufficient conditions of uniform consistency are explored....
research
03/02/2021

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

There is an ongoing debate in the NLP community whether modern language ...
research
04/11/2022

Uniform Complexity for Text Generation

Powerful language models such as GPT-2 have shown promising results in t...
research
06/02/2021

Lower Perplexity is Not Always Human-Like

In computational psycholinguistics, various language models have been ev...
research
08/05/2020

Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

We test the hypothesis that the extent to which one obtains information ...

Please sign up or login with your details

Forgot password? Click here to reset