How (Non-)Optimal is the Lexicon?

04/29/2021
by   Tiago Pimentel, et al.
0

The mapping of lexical meanings to wordforms is a major feature of natural languages. While usage pressures might assign short words to frequent meanings (Zipf's law of abbreviation), the need for a productive and open-ended vocabulary, local constraints on sequences of symbols, and various other factors all shape the lexicons of the world's languages. Despite their importance in shaping lexical structure, the relative contributions of these factors have not been fully quantified. Taking a coding-theoretic view of the lexicon and making use of a novel generative statistical model, we define upper bounds for the compressibility of the lexicon under various constraints. Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes. We find that (compositional) morphology and graphotactics can sufficiently account for most of the complexity of natural codes – as measured by code length.

READ FULL TEXT
research
12/31/2018

On Optimal Locally Repairable Codes with Super-Linear Length

Locally repairable codes which are optimal with respect to the bound pre...
research
03/26/2019

Enumeration of irreducible and extended irreducible Goppa codes

We obtain upper bounds on the number of irreducible and extended irreduc...
research
06/18/2020

No projective 16-divisible binary linear code of length 131 exists

We show that no projective 16-divisible binary linear code of length 131...
research
01/09/2018

The DMT classification of real and quaternionic lattice codes

In this paper we consider space-time codes where the code-words are rest...
research
01/13/2020

Upper Bound Scalability on Achievable Rates of Batched Codes for Line Networks

The capacity of line networks with buffer size constraints is an open, b...
research
06/28/2022

Subsequences With Gap Constraints: Complexity Bounds for Matching and Analysis Problems

We consider subsequences with gap constraints, i.e., length-k subsequenc...
research
02/19/2021

The DMT of Real and Quaternionic Lattice Codes and DMT Classification of Division Algebra Codes

In this paper we consider the diversity-multiplexing gain tradeoff (DMT)...

Please sign up or login with your details

Forgot password? Click here to reset