The statistical trade-off between word order and word structure - large-scale evidence for the principle of least effort

by   Alexander Koplenig, et al.

Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as those languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in more than 1,100 different languages that are spoken as a native language by approximately 6 billion people (more than 80 evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. In addition, we find that - despite differences in the way information is expressed - there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.



There are no comments yet.


page 13


Human languages order information efficiently

Most languages use the relative order between words to encode meaning re...

Frozen Binomials on the Web: Word Ordering and Language Conventions in Online Text

There is inherent information captured in the order in which we write wo...

A surprisal–duration trade-off across and within the world's languages

While there exist scores of natural languages, each with its unique feat...

An In-depth Study on Internal Structure of Chinese Words

Unlike English letters, Chinese characters have rich and specific meanin...

Language That Matters: Statistical Inferences for Polarity Identification in Natural Language

Information forms the basis for all human behavior, including the ubiqui...

Complexity and universality in the long-range order of words

As is the case of many signals produced by complex systems, language pre...

Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model

Across languages, multiple consecutive adjectives modifying a noun (e.g....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.