Optimal coding and the origins of Zipfian laws

06/04/2019
by   Ramon Ferrer-i-Cancho, et al.
0

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.

READ FULL TEXT
research
05/04/2016

Compression and the origins of Zipf's law for word frequencies

Here we sketch a new derivation of Zipf's law for word frequencies based...
research
06/30/2021

Zipf's laws of meaning in Catalan

In his pioneering research, G. K. Zipf formulated a couple of statistica...
research
12/17/2017

Benford's Law and First Letter of Word

A universal First-Letter Law (FLL) is derived and described. It predicts...
research
03/17/2023

Direct and indirect evidence of compression of word lengths. Zipf's law of abbreviation revisited

Zipf's law of abbreviation, the tendency of more frequent words to be sh...
research
09/06/2021

The asymptotic joint distribution of the largest and the smallest singular values for random circulant matrices

In this manuscript, we study the limiting distribution for the joint law...
research
11/13/2019

Enumerative Data Compression with Non-Uniquely Decodable Codes

Non-uniquely decodable codes can be defined as the codes that cannot be ...
research
03/06/2018

Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages

We demonstrate that large texts, representing human (English, Russian, U...

Please sign up or login with your details

Forgot password? Click here to reset