Neutral evolution and turnover over centuries of English word popularity

03/30/2017
by   Damian Ruck, et al.
0

Here we test Neutral models against the evolution of English word frequency and vocabulary at the population scale, as recorded in annual word frequencies from three centuries of English language books. Against these data, we test both static and dynamic predictions of two neutral models, including the relation between corpus size and vocabulary size, frequency distributions, and turnover within those frequency distributions. Although a commonly used Neutral model fails to replicate all these emergent properties at once, we find that modified two-stage Neutral model does replicate the static and dynamic properties of the corpus data. This two-stage model is meant to represent a relatively small corpus (population) of English books, analogous to a `canon', sampled by an exponentially increasing corpus of books in the wider population of authors. More broadly, this mode -- a smaller neutral model within a larger neutral model -- could represent more broadly those situations where mass attention is focused on a small subset of the cultural variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2018

Word Familiarity and Frequency

Word frequency is assumed to correlate with word familiarity, but the st...
research
11/27/2019

SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

With language modeling becoming the popular base task for unsupervised r...
research
03/11/2015

Is language evolution grinding to a halt? The scaling of lexical turbulence in English fiction suggests it is not

Of basic interest is the quantification of the long term growth of a lan...
research
06/04/2021

Modeling the Unigram Distribution

The unigram distribution is the non-contextual probability of finding a ...
research
06/23/2000

Estimation of English and non-English Language Use on the WWW

The World Wide Web has grown so big, in such an anarchic fashion, that i...
research
01/05/2015

Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

It is tempting to treat frequency trends from the Google Books data sets...
research
07/29/2023

Automatic Extraction of the Romanian Academic Word List: Data and Methods

This paper presents the methodology and data used for the automatic extr...

Please sign up or login with your details

Forgot password? Click here to reset