How (Non-)Optimal is the Lexicon?

by   Tiago Pimentel, et al.

The mapping of lexical meanings to wordforms is a major feature of natural languages. While usage pressures might assign short words to frequent meanings (Zipf's law of abbreviation), the need for a productive and open-ended vocabulary, local constraints on sequences of symbols, and various other factors all shape the lexicons of the world's languages. Despite their importance in shaping lexical structure, the relative contributions of these factors have not been fully quantified. Taking a coding-theoretic view of the lexicon and making use of a novel generative statistical model, we define upper bounds for the compressibility of the lexicon under various constraints. Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes. We find that (compositional) morphology and graphotactics can sufficiently account for most of the complexity of natural codes – as measured by code length.


On Optimal Locally Repairable Codes with Super-Linear Length

Locally repairable codes which are optimal with respect to the bound pre...

Enumeration of irreducible and extended irreducible Goppa codes

We obtain upper bounds on the number of irreducible and extended irreduc...

No projective 16-divisible binary linear code of length 131 exists

We show that no projective 16-divisible binary linear code of length 131...

The DMT classification of real and quaternionic lattice codes

In this paper we consider space-time codes where the code-words are rest...

Reconstructing Words from Right-Bounded-Block Words

A reconstruction problem of words from scattered factors asks for the mi...

Improved error bounds for the distance distribution of Reed-Solomon codes

We use the generating function approach to derive simple expressions for...

The DMT of Real and Quaternionic Lattice Codes and DMT Classification of Division Algebra Codes

In this paper we consider the diversity-multiplexing gain tradeoff (DMT)...