Population size predicts lexical diversity, but so does the mean sea level - why it is important to correctly account for the structure of temporal data

by   Alexander Koplenig, et al.

In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.


page 1

page 2

page 3

page 4


Lexical Diversity in Kinship Across Languages and Dialects

Languages are known to describe the world in diverse ways. Across lexico...

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

This paper describes a method to enrich lexical resources with content r...

Capturing the diversity of multilingual societies

Cultural diversity encoded within languages of the world is at risk, as ...

Employment in Tourism Industries: Are there Subsectors with a Potentially Higher Level of Income?

This work analyzes the tourist sector, the employment generated by the t...

Journals Titles and Mission Statements: Lexical structure, diversity and readability

There is an established research agenda on dissecting an articles compon...

Qualitative and Quantitative Analysis of Diversity in Cross-document Coreference Resolution Datasets

Cross-document coreference resolution (CDCR) datasets, such as ECB+, con...

The ontogeny of discourse structure mimics the development of literature

Discourse varies with age, education, psychiatric state and historical e...

Please sign up or login with your details

Forgot password? Click here to reset