Universality and diversity in word patterns

08/23/2022
by   David Sánchez, et al.
3

Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induce correlations among neighboring words. Further, words are not evenly distributed but approximately follow a power law since terms with a pure semantic content appear much less often than terms that specify grammar relations. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for eleven major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern distributions. Remarkably, we find that these relations can be modeled with a Markov model of order 2 and that this result is universally valid for all the studied languages. Furthermore, fluctuations of the pattern distributions can allow us to determine the historical period when the text was written and its author. Taken together, these results emphasize the relevance of time series analysis and information-theoretic methods for the understanding of statistical correlations in natural languages.

READ FULL TEXT

page 5

page 6

research
01/09/2020

The empirical structure of word frequency distributions

The frequencies at which individual words occur across languages follow ...
research
07/05/2018

Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation

Zipf's law has been found in many human-related fields, including langua...
research
10/05/2015

Stochastic model for phonemes uncovers an author-dependency of their usage

We study rank-frequency relations for phonemes, the minimal units that s...
research
09/10/2022

Subdiffusive semantic evolution in Indo-European languages

How do words change their meaning? Although semantic evolution is driven...
research
02/12/2020

Unsupervised Separation of Native and Loanwords for Malayalam and Telugu

Quite often, words from one language are adopted within a different lang...
research
07/30/2015

Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language

We develop the information-theoretical concepts required to study the st...
research
07/29/2019

A Mathematical Model for Linguistic Universals

Inspired by chemical kinetics and neurobiology, we propose a mathematica...

Please sign up or login with your details

Forgot password? Click here to reset