Population size predicts lexical diversity, but so does the mean sea level - why it is important to correctly account for the structure of temporal data

by   Alexander Koplenig, et al.

In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.



There are no comments yet.


page 1

page 2

page 3

page 4


Capturing the diversity of multilingual societies

Cultural diversity encoded within languages of the world is at risk, as ...

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Paraphrase generation is an important task in natural language processin...

Employment in Tourism Industries: Are there Subsectors with a Potentially Higher Level of Income?

This work analyzes the tourist sector, the employment generated by the t...

Journals Titles and Mission Statements: Lexical structure, diversity and readability

There is an established research agenda on dissecting an articles compon...

Qualitative and Quantitative Analysis of Diversity in Cross-document Coreference Resolution Datasets

Cross-document coreference resolution (CDCR) datasets, such as ECB+, con...

The ontogeny of discourse structure mimics the development of literature

Discourse varies with age, education, psychiatric state and historical e...

Airbnb's disruption of the housing structure in London

This paper explores Airbnb, a peer-to-peer platform for short-term renta...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.