I Introduction
A language is a typical complex system, and is characterised by wellknown languageindependent statistical laws such as Zipf’s law and Heap’s law altmann2016statistical . In this study, we investigate the dynamical statistical properties in languages by using massive databases related to word usage that has developed in the past 10 years. Especially, we focus on the stability or the slowness of change in the usage of already popular words from the viewpoint of diffusion on a complex system and show that common logarithmic diffusion (i.e. very slow diffusion or change) is approximately observed by some languages or media.
Diffusion on complex systems, which is an attractive research topic in physics or complex system science, has been extensively studied both theoretically and empirically and has been applied to various systems such as biological or social systems. The diffusion on complex systems is basically characterised by the meansquared displacement (MSD). The vast majority of studies reported that MSD growth occurs asymptotically according to the power law,
(1) 
In the case of , the diffusion corresponds to normal diffusion, such as the diffusion of particles in water, which is modelled using a random walk. In other cases, it is known as anomalous diffusion, especially, it is termed subdiffusion for and superdiffusion for . Many complex systems have been shown to exhibit this powerlaw type of anomalous diffusion in diverse areas, such as physics, chemistry, geophysics, biology, and economy metzler2000random ; da2014ultraslow . In theoretical studies, anomalous diffusion is explained using the correlation of random noise (e.g. a random walk in disordered media) bouchaud1990anomalous
, a finitevariance (e.g. a Levy flight)
bouchaud1990anomalous ; metzler2000random , a powerlaw wait time (e.g. a continuous random walk) bouchaud1990anomalous ; metzler2000random ; burov2011single , and a long memory (e.g. a fractional random walk) lowen2005fractal ; burov2011single .Another class of anomalous diffusion is predicted by theories, where the MSD growths logarithmically,
(2) 
This type of diffusion is known as “ultraslow diffusion”. One of the bestknown examples that was first discovered is the diffusion in a disordered medium (it is known as Sinai diffusion for ) sinai1983limiting . Thereafter, other types of models that explain ultraslow diffusion have also been proposed such as continuous random walk (CTRW) with the waiting time generated by the logarithmicform probability density function godec2014localisation , CTRW with waiting time generated by the powerform probability density function and the excluded volume effect sanders2014severe , temporal change of diffusion coefficients bodrova2015ultraslow , spatial changes cherstvy2013population , and fractional dynamics eab2011fractional .
Although many theoretical studies of ultraslow diffusion have been reported, we were unable to find empirical examples thereof. A rare example of diffusion related to the logarithmic function, which is similar but different from the “ultraslow diffusion” defined by Eq. 2 (i.e. ), is the mobility of humans measured by mobile phone data. In this study, by using both data and models, the authors insisted that the MSD grows according to or becomes saturated (i.e. the MSD grows slower than ). This diffusion is mainly explained by the CTRW and preferential return (to home) effects song2010modelling . Diffusion resembling this very slow diffusion was also observed in the mobility of monkeys and the authors maintained that this diffusion may be explained by the heterogeneity of the space such as by Sinai’s model boyer2011non . Note that logarithmic “relaxation” phenomena, which are known as “ageing”, are observed in many systems such as paper crumpling matan2002crumpling and grain compalification richard2005slow .
We investigate the stability or dynamic usage of already popular words. In other words, we focus only on the dynamics of the “mature phase” in the life trajectory of words (consisting of an “infant phase”, an “adolescent phase”, and a “mature phase”) petersen2012statistical ; gerlach2013stochastic . The pioneer study of a stability and variation of language from the viewpoint on the dynamical statistical property was given by Erez Lieberman et. al lieberman2007quantifying . In this study, the regularization of English verbs (i.e., change from irregular to regular verbs) over the past 1200 years was investigated, and the 0.5th power law of the regularization rate as a function of word frequencies (i.e., higher frequency words involve less changes, or are more stable) was noted. This study quantified the stability of language on a historical timescale (i.e., from 100 to 1000 years). In contrast, our study focuses on stability on a shorter timescale (i.e., from 1 day to 10 years). Note that some findings relating to the dynamics or properties of words in the “infant phase”, the “adolescent phase” or total life trajectory (i.e. from birth to death) were conducted by using the Google Ngram data corpus (in which word frequencies occurring in printed books from 1520 to 2000 are given) michel2011quantitative ; petersen2012statistical ; gerlach2013stochastic . In these studies, the authors found some statistical properties such as the typical time to reach the “adolescent phase” is about 20 or 30 years; the MSD is superdiffusion and the dynamics are related to Yule’s, Simons, Gibrat’s, and preferential attachment processes petersen2012statistical ; gerlach2013stochastic .
Note that physicists have studied linguistic phenomena using the concepts of complex systems link1 , such as competitive dynamics abrams2003linguistics , statistical laws altmann2016statistical , complex networks cong2014approaching
, the phase transition and the information theory
i2005zipf .In this paper, in order to quantify “the stability” or “the speed of change” of the usage of already popular words (i.e. the mature phase) precisely, we measure the MSD by using actual data and introduce the timeevolution model of frequencies of words for it. In addition, we clarify the dynamics behind this diffusion.
Firstly, we investigate the MSD of the time series of word counts of three actual types of data: (i) newspapers for 10 years (Japanese), (ii) blogs for 5 years (Japanese) and (iii) Wikipedia page views for 2 years (English, French, Chinese and Japanese). This approach enabled us to observe an ultraslowlike diffusion for all data (Figs. 2 and 3).
Secondly, we discuss the relation between empirical results and the random walk model with the powerlaw forgetting given by Eq. 10, which is related to the fractional Langevin equation, and can essentially explain the ultraslow diffusion.
Thirdly, we introduce a model of word counts sampled from the Poisson process (Eq. 27) of which the rate is generated by the previously mentioned random walk model (Eq. 10), in order to connect the ultraslow diffusion explained by Eq. 10, with peculiar properties of word count data, such as discreteness(i.e. counts take ). In addition, we show that the model can consistently reproduce the following empirical dynamical statistical properties (Fig. 5):

Mean squared displacement [MSD],
(3) 
Power spectral density [PSD] (periodogram),
(4) 
Probability density function [PDF] (histogram)
(5)
where is the word count scaled by the database size at the date defined by Section II.2; is the last date of observation; is a time lag (positive integer, ); is a (spectral) frequency; is the bin size of a histogram; represents the value of a bin of a histogram; and means the number of elements of the set .
Finally, we conclude with a discussion.
Ii Data set
Our data analysis involved analysing the daily timeseries of the word counts in newspapers (Japanese), blogs (Japanese), and Wikipedia page views (English, French, Chinese, and Japanese).
ii.1 Data sources
Newspapers
We obtained the timeseries of word appearance per day in nationwide Japanese newspapers by using the “Shinbun trend in NIKKEI Telecom” database, which was provided by Nikkei Inc. Using the database, we obtained the daily number of articles containing a keyword from 80 newspapers published in Japan between Jan. 2005 and Sep. 2017 nikkei . Note that if an article contains more than two focused keywords (e.g. key word “dog”: “There is a dog. The dog is big.”), the system counts it as one article. We used the top 10,000 ranked words in frequency order as keywords. We referred to the pages entitled ”Wiktionary:Frequency lists” in Wiktionary wikitionaly to obtain the rank of the word frequency.
Blogs
We obtained the timeseries of the daily number of articles containing a keyword in nationwide Japanese blogs using a large database of Japanese blogs (”Kuchikomi@kakaricho”) provided by Hottolink, Inc. This database contains 3 billion articles of Japanese blogs, which covers 90% of Japanese blogs from Nov. 2006 to Dec. 2012 RD_base . Note that in common with the newspaper data, if one article contains more than two focused keywords, the system counts it as one article. We used 1,771 basic adjectives and 60,476 nouns as keywords from ipadic idadic .
Wikipedia page views
We obtained daily Wikipedia pageviews using PageviewAPI, which is a public API developed and maintained by the Wikimedia Foundation. This API provides analytical data about article pageviews (or the number of page loads) of Wikipedia. By inputting an article title as a keyword (e.g. “dog”), a time period (e.g. from 1st Jan. 2017 to 31st Nov. 2017) and a language (e.g. English Wikipedia) to the API, we can obtain timeseries of count data on how many times people have visited the focused article (e.g. the number of loads or pageviews of the “dog” page in the English Wikipedia) per day during a given time period. Although Wikipedia page views are not the word appearance of a keyword in documents unlike newspaper data and blog data, they are often used to investigate the daily changes of concerns of keywords (or article) in common with newspaper and blog data. We obtained the data of the English, French, Chinese, and Japanese pages from Jul. 2015 to Sep. 2017 wikipedia_pageview . We used the top 10,000 ranked words in frequency order as keywords RD_base with respect to each language. To obtain the rank of the word frequency, we referred to the pages entitled ”Wiktionary:Frequency lists” in Wiktionary as is the case with the newspaper data wikitionaly .
ii.2 Normalised timeseries of word appearances
We define herein as follows the notation of the timeseries of the word counts and the normalised word counts :

is the raw daily counts of the th word within the nationwide dataset (Fig.1(a)), where is the last date of observation, and is the number of observed keywords.
Concretely speaking, for the newspaper and blog data, corresponds to the daily number of articles containing the th keyword in the database. For the Wikipedia page view data, it corresponds to the daily page view of an article entitled the th keyword (how many times people have visited the focused article).

is the timeseries of the daily count normalised by the temporal scale of database (Fig. 1 (c)).
corresponds to the original time deviation of the th word separated from the effects of deviations in the scale of database (Figs. 1 (b) and (c)). The scale of database almost corresponds to the (normalised) total number of articles (i.e. temporal database size) for the newspaper and blog data. For the Wikipedia data, it conceivable that almost corresponds to the (normalised) total temporal number of users of Wikipedia of a focused language ( does not correspond to the size or number of articles of Wikipedia of a focused language itself.).
is estimated herein by the ensemble median of the number of words at time
, as described in the Appendix A. assumes that for normalisation (Fig.1(b)).
Iii Data analysis: Ultraslowlike diffusion in the empirical data
We next calculate the MSD of the actual data. We use the following temporal MSD for data analysis,
(6) 
where is the time lag (e.g. for , it corresponds to a weekly difference; for , it corresponds to almost a monthly difference; and for , it corresponds to a yearly difference.). Thus, the MSD quantifies the changes of word counts of the focused keyword growth in days. Note that the statistics has a meaning when the differential is steady. (the normalised counts by the scaled database size) sampled from our mathematical model (described subsequently) do not contradict with this condition (Appendix F), and the majority of corresponding empirical data approximately satisfies this condition, although the raw counts do not always satisfy this condition because of effects, such as increasing database size (Fig. 1).
Fig. 2 shows examples of the MSDs of typical words for the Japanese newspaper (a), Japanese blogs (b), and English Wikipedia page views (c). The results in these figures confirm that all growth of MSDs is essentially approximated by the logarithmic function,
(7) 
Next, we verify the validity of the above result by calculating the ensemble median of (temporal) scaled MSD by using all words with a large frequency on the respective databases. If we assume Eq. 7, the scaled MSD has a wordindependent curve,
(8) 
where is the temporal median of the set and is the maximum lag which we use to make a graph. Thus, we can use the ensemble over words and the ensemble median obeys the logarithmic function,
(9) 
where is the median over the words set and is the size of the set. We take the median over the set of the mean frequency over 30, . We exclude words with a small mean because they have relatively large signaltonoise ratios (see Eq. 30). Figs. 3 (a)(f) show that the logarithmic curve is approximately observed for all data sets, namely newspapers, blogs, and Wikipedia content (English, French, Chinese, and Japanese). Here, because there are words with a nonnegligible weekly or annual cycle, the raw ensemble of MSD also has these cycles (grey dots or grey thin lines). Thus, we can observe the logarithmic curve by using the 365day moving median, which cancels these cycles. Note that by replacing the ensemble median with the ensemble mode in Eq. 9, we can also obtain the essentially same logarithmic diffusion. This logarithmic diffusion is not in conflict with our intuition that languages are basically stable but change constantly.
Iv Model
This section explains the properties of word counts by the combination of two probabilistic models:(i) the random walk model with the powerlaw forgetting and (ii) the random diffusion model (i.e, a kind of the Poisson point process). The random walk model describes the latent concerns of the focused word and it can explain the ultraslow diffusion essentially. Besides, the random diffusion model expresses the connection between the latent concern described abovementioned random walk model and the observable word counts or . Here, first we introduce and discuss the random walk model, and next, we introduce the word counts model which is the combination of the random walk model and random diffusion model.
iv.1 Model: Relation with the random walk
Here, we present the extent to which the empirical result and a random walk correspond with the powerlaw forgetting, which is one of the most representative standard explanations of anomalous diffusion in previous studies. This approach is also equivalent to the fractional dynamics approach (in our case, the fractional Langevin equation approach).
The random walk model with the powerlaw forgetting is given by
(10) 
where , is a constant used to characterise the forgetting speed and
is independent and identically distributed noise where the mean takes zero and the standard deviation is
, that is, we can write . Here, is independent and identically distributed noise where the mean takes zero and the standard deviation is 1.This model is an extension of the normal random walk model, namely, the model corresponds to the random walk for and to the steady IID noise for . For the timeseries of the word counts, the model is interpreted by considering that the social concern of the th word at the time , is determined by the summation of received outer shocks until the time in the case of . In the case of , (i.e. the social concern) is determined by both the abovementioned summation effects and the effects of forgetting shocks in a powerlaw manner.
From the Appendix B, the MSD of this model is calculated by
(11)  
(12) 
This formula implies corresponds to our empirical results, that is, the logarithmiclike diffusion.
We also verify the validity of the model by comparing the power spectrum density (PSD) between the data and the model. The PSD of the model Eq. 10 is approximated by
(13)  
(14) 
where . We use herein the formula of the PSD of granger1980introduction , by which our model was approximated (Appendix D) and the empirical PSD of the time series is calculated as follows:
(15) 
where is the frequency [1/days]. is the abbreviation for autoregressive fractionally integrated moving average model, which is a wellknown timeseries model that describes a timeseries with a long memory, in the field of statistics burnecki2014algorithms . is defined by Eq. D3. For , this formula is also approximated by
(16) 
Thus, for the power spectrum is approximated by the simple power law, .
Because the concern of word is directly observed from the actual word counts data as mentioned (see the section IV.2), alternatively we use the normalised power spectrum of word counts for , ,
(17)  
(18)  
(19) 
where is the minimum in the observation, and we used the assumption
(20) 
where and are constants depending on the word . Hence, we can obtain the information of from the observable , which we estimate by using a periodogram in this study. The validity of this assumption is discussed in section IV.2. Figs. 3(gi) show the ensemble median of the normalised power spectrum of word counts given by Eq.19 over the word sets,
(21)  
(22) 
where, for data analysis, we take the median over the set in which the mean is above 30. The results in these figures confirm that Eq. 22 is in agreement with of actual data given by Eq. 21 for all data sets.
In order to check the plausibility of , in addition, we estimate directly from data, with respect to individual words. Herein we use the model described by Eq. 10 and Eq. 27 (outlined subsequently) and details of the estimation method are provided in the Appendix C. Fig. 4 shows the histogram of estimated for the newspaper data, blog data, and Wikipedia data. This figure confirms that the mode of estimated takes the value of approximately 0.5 for all datasets.
iv.1.1 Relation to the fractional dynamics
Here, we address the relation between the fractional dynamics and the random walk model. From Appendix D, the continuous version of Eq. 10 corresponds to the fractional Langevin equation, which is the expansion of the Langevin equation eab2011fractional ; magdziarz2007fractional ,
(23) 
where, on the condition that , this equation is the normal Langevin equation. Here the RiemannLiouville fractional derivative operator of is defined by
(24) 
This operator is satisfied with . For example, in the case of , the operator is times the derivative operator,that is, three operations of mean one normal derivative,
(25) 
Therefore, in the case of the word counts, namely, , we can obtain the halforder fractional Langevin equation,
(26) 
where is the half derivative operator, . Thus, the properties of the word count time series are rightinthemiddle dynamics between the IID noise (zeroorder differentiation) and the normal random walk (firstorder differentiation).
iv.2 Model of word counts
In the previous section, we confirmed that the logarithmic diffusion in word counts can be explained by the random walk with powerlaw forgetting given by Eq. 10 essentially. However, this random walk model cannot explain all the statistical properties of word counts we observed in this paper. For example, we cannot explain: (i) the discreteness of the row word counts and (ii) the worddependent constants in Eq. 7 and in Eq. 20. Thus, lastly, we discuss the connection between the essential dynamics of the concern of word given by Eq. 10 (i.e. the latent value) and the time series of word counts or (i.e. the observed value).
Here, we use the random diffusion model (RD model) introduced in PhysRevLett.100.208701 ; PhysRevE.87.012805 ; sano2009 ; RD_base to sample or . The RD model is a kind of point process, which can be deduced from the simple model of the writing activity of independent bloggers RD_base .
In this model, values are sampled from the Poisson distribution of which the rate (or intensity) function is determined by a random variable or a stochastic process (i.e. the doubly stochastic Poisson process
lowen2005fractal ). In the case of blogs, the rate function is connected to the latent concern of word . Particularly, the RD model is given by RD_base(27) 
and its rate function of the Poisson distribution (denoted by ), is determined by the following definition of the product:
(28) 
where

is the scale of the th word, namely, the temporal means of the th word, where we estimate the mean of the raw word count of data .

is the magnitude (i.e. the standard deviation) of the ensemble fluctuation, which may be related to the magnitude of the heterogeneity of bloggers RD_base .

is the normalised ensemble fluctuation, which is sampled from the systemdependent random variable with a mean , standard deviation , and parameters that characterize the distribution.
Note that in the previous study Ref. RD_base , we estimated from data directly by using the moving average for data analysis or used the assumption for analytical calcultion. Thus, we could not discuss the properties of the dynamics as such in Ref RD_base (The model only describes the fluctuation when dynamics of are given). However, in this study we introduce the timeevolution model given by Eq. 10, enabling us to calculate the basic dynamics of already popular words (The model describes not only the fluctuation but also the dynamics.).
Also note that the word counts are confined from zero to the size of databases (i.e. the total number of articles for the newspaper and blog data and the total number of Wikipedia users for the Wikipedia data) in the actual data. However, our model does not consider this limitation, and this problem is substantially neglectable in our situation. The main reasons for this are as follows:

The time evolution is very slow (i.e. logarithmic diffusion) on the condition the initial value being and the finite time step (a maximum of approximately 10 years); hence, walks on around 1. The cases taking a negative value (for which the rate function and Eq. 27 become meaningless) and a very large value (related to the limitation of the total number of articles ) were not sampled practically.

Almost all words take a very small temporal mean of the word counts to be affected by the limitation of the total number of articles ().
Though in our case, we were almost able to avoid these problems of constraints without a special treatment, in general situations, such as infinite time step (), we may have to extend the model to explicitly describe these constraints.
Fig. 5 compares the statistical properties of the word counts timeseries between the empirical data and the numerical simulation of the RD model driven by the random walk model with the powerlaw forgetting given by Eq. 27 and Eq. 10 with respect to the (i) MSD, (ii) PSD, and (iii) (temporal) probability density function (PDF) of the differential (see Eq. 5). The results in these figures confirm that the numerical simulations are almost in accordance with the empirical observations in the newspaper, blog, and Wikipedia data, respectively.
In these simulations, and are sampled from the normalised noncentral tdistribution . The normalised noncentral tdistribution , of which the mean is zero and the standard deviaton 1, is the shifted and scaled noncentral tdistribution . The noncentral tdistribution
is a skewed heavy tail distribution, the tail parameter
determine the heaviness of tail and the noncentrality parameter determine the skewness of the distribution. On the condition that , the noncentral tdistribution corresponds to (normal or noskew) tdistribution. The detail of this distribution given by Appendix E.In the figure, we use the worddependent or systemdependent parameters, namely, the mean frequency (scale) of word counts , the speed of the diffusion (or mean strength of outer shocks) and the magnitude of ensemble fluctuation (maybe related to the heterogeneity of bloggers) as follows: , and for “Tachiba (i.e. position or standpoint in English)” on the newspaper data, , and for “Sanada (i.e. wellknown Japanese family name)” on the blog data, and , and for “Handle” on the English Wikipedia data.
Lastly, we show the relation between the RD model and the worddependent constant , , in Eq. 7 and , in Eq. 20. From Appendix F, the (mean of the temporal) MSD of is written by
(29)  
(30) 
where , , is the digamma function and this curve is shown in the magenta thick dashdotted lines in Figs. 5 in (df).
In addition, from Appendix G the power spectrum density of is written by,
(31) 
where , and this curve is shown in the magenta thick dashdotted lines in Figs. 5 in (gi).
We also verified that the model cannot reproduce the statistical properties of the empirical data on the condition that does not take around . Fig. 6 shows the results in which we compare the empirical data with the numerical simulations for different (the speeds of forgetting) and (database size): (i) the IID noise (), (ii) the simple random walk model () and (iii) the case where the database size is constant ( and ). This figure confirms that the IID noise () and the random walk model () cannot reproduce the empirical properties. In addition, is not essential in reproducing the empirical properties (see panels in the second line).
Note that the model given by Eq. 27 and Eq. 10 can also explain the “fluctuation scaling”, which is known as the other property of word counts on social media such as blogs eisler2008fluctuation ; sano2009 ; RD_base . The relation between the empirical fluctuation scaling and the model given by Eq. 27 and Eq. 10 will be discussed in our next paper.
V Conclusion and discussion
In this paper, from the viewpoint of the diffusion of complex systems, we investigated the stability of the timeseries of word counts of already popular words (i.e. “mature phase words“) on some nationwide language data sets (newspaper articles, blog articles, and Wikipedia page views).
Firstly, by analysing the data, we commonly observed a logarithmiclike diffusion (i.e. an ultraslowlike diffusion) in word counts between different data sets. Although ultraslow diffusion has been extensively studied by using theories or mathematical models, few empirical observations have been reported. Moreover, this logarithmiclike diffusion from observed facts is not in conflict with the intuition in which languages are basically stationary but change constantly. This intuition may be related to the empirical studies of the stability of word count statistics: (i) more frequent words change slower lieberman2007quantifying ; gerlach2016similarity , and (ii) some observations implied small stable core (kernel) vocabularies as distinguished from other many vocabularies for specific communications which are not shared by all people ferrer2001two ; ferrer2017origins .
Secondly, we show that the logarithmic diffusion of word counts is essentially explained by the random walk model with forgetting in the power law. This random walk model corresponds with the fractional Langevin equation, which is a typical mathematical model in previous theoretical studies of anomalous diffusions. The speed of forgetting characterized by the powerlaw exponent in Eq. 10 has the following meanings:

The border (or thresholds) between the stationary and the nonstationary (Eq. 12), and

Rightinthemiddle dynamics between IID noise and the normal random walk (Eq. 23).
(32)
which are summarized in Table. 1.
Thirdly, we confirmed the RD model driven by the random walk model with forgetting given by Eqs. 10 and 27 in the power law can almost reproduce the empirical properties timeseries of typical words (Fig. 5): (i) MSD, (ii) PSD, and (iii) PDF.
Although our model can explain the dynamical properties of the word counts time series, our framework cannot explain the model parameter in Eq. 10. This special value, , which is the threshold between steady dynamics and unsteady dynamics, is observed detailindependently (i.e. words, languages, and media independent) as far as we investigated. Thus, clarifying the origin of the parameter may provide a clue to understand the fundamental dynamical and memory properties of human systems or societies as complex systems.
In the microlevel study, namely, the study of single documents, the power law of the forgetting process with the worddependent exponents which are distributed approximately around 0.5, is used to explain the empirical stretched exponential distribution of the recurrence distance of words (e.g., for the phrase “This cat is big. That cat is small.”, the recurrence distance of “cat” is 4.)
altmann2009beyond . This quantitative similarity of the power law of forgetting dynamics between data of micro (single document) and macrolevel (nationwide collective behavior datasets) studies might provide important suggestions to understand the origin of the 0.5th exponent obtained in our macrolevel study from microlevel human behavior.Acknowledgements.
The authors would like to thank Hottolink, Inc. for providing the data. This work was supported by JSPS KAKENHI, Grant Number JP17K13815.References
 (1) E. G. Altmann and M. Gerlach, Creativity and Universality in Language (Springer, Basel, 2016), pp. 7–26.
 (2) R. Metzler and J. Klafter, Physics reports 339, 1 (2000).
 (3) M. A. A. da Silva, G. M. Viswanathan, and J. C. Cressoni, Physical Review E 89, 052110 (2014).
 (4) J.P. Bouchaud and A. Georges, Physics reports 195, 127 (1990).
 (5) S. Burov, J.H. Jeon, R. Metzler, and E. Barkai, Physical Chemistry Chemical Physics 13, 1800 (2011).
 (6) S. B. Lowen and M. C. Teich, Fractalbased point processes (John Wiley & Sons, Hoboken, USA, 2005), Vol. 366.
 (7) Y. G. Sinai, Theory of Probability & Its Applications 27, 256 (1983).
 (8) A. Godec et al., Journal of Physics A: Mathematical and Theoretical 47, 492002 (2014).
 (9) L. P. Sanders et al., New Journal of Physics 16, 113050 (2014).
 (10) A. S. Bodrova, A. V. Chechkin, A. G. Cherstvy, and R. Metzler, New Journal of Physics 17, 063038 (2015).
 (11) A. G. Cherstvy and R. Metzler, Physical Chemistry Chemical Physics 15, 20220 (2013).
 (12) C. H. Eab and S. C. Lim, Physical Review E 83, 031136 (2011).
 (13) C. Song, T. Koren, P. Wang, and A.L. Barabási, Nature Physics 6, 818 (2010).
 (14) D. Boyer, M. C. Crofoot, and P. D. Walsh, Journal of The Royal Society Interface rsif20110582 (2011).
 (15) K. Matan, R. B. Williams, T. A. Witten, and S. R. Nagel, Physical Review Letters 88, 076101 (2002).
 (16) P. Richard et al., Nature materials 4, 121 (2005).
 (17) A. M. Petersen, J. Tenenbaum, S. Havlin, and H. E. Stanley, Scientific reports 2, (2012).
 (18) M. Gerlach and E. G. Altmann, Physical Review X 3, 021006 (2013).
 (19) E. Lieberman et al., Nature 449, 713 (2007).
 (20) J.B. Michel et al., science 331, 176 (2011).
 (21) E. G. Altmann and M. Gerlach, Physicists’ papers on natural language from a complex systems viewpoint, http://www.pks.mpg.de/mpidoc/sodyn/physicistlanguage/.
 (22) D. M. Abrams and S. H. Strogatz, Nature 424, 900 (2003).
 (23) J. Cong and H. Liu, Phys Life Rev. 11, 598 (2014).
 (24) R. F. i Cancho, Eur. Phys. J. B 47, 449 (2005).
 (25) Nikkei Inc. and Nikkei Business Publications, Inc., Shinbun trend (web system), http://ntrend.nikkei.co.jp/.
 (26) Wiktionary:Frequency lists, https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists.
 (27) H. Watanabe, Y. Sano, H. Takayasu, and M. Takayasu, Physical Review E 94, 052317 (2016).
 (28) A. Masayuki and M. Yuji, User’s manual (ipadic), http://chasen.naist.jp/snapshot/ipadic/ipadic/doc/ipadicja.pdf, 2003.
 (29) Analytics/AQS/Pageviews, https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews.
 (30) C. W. Granger and R. Joyeux, Journal of time series analysis 1, 15 (1980).
 (31) K. Burnecki and A. Weron, Journal of Statistical Mechanics: Theory and Experiment 2014, P10036 (2014).
 (32) M. Magdziarz and A. Weron, Studia Math 181, 47 (2007).
 (33) S. Meloni, J. GómezGardeñes, V. Latora, and Y. Moreno, Phys. Rev. Lett. 100, 208701 (2008).
 (34) Y. Sano et al., Phys. Rev. E 87, 012805 (2013).
 (35) Y. Sano, K. K. Kaski, and M. Takayasu, in Proc. Complex ’09 (Springer, Berlin, Germany, 2009), No. 2, pp. 195–198.
 (36) Z. Eisler, I. Bartos, and J. Kertesz, Adv. Phys. 57, 89 (2008).
 (37) M. Gerlach, F. FontClos, and E. G. Altmann, Phys. Rev. X 6, 021009 (2016).
 (38) R. Ferrer i Cancho and R. V. Solé, Journal of Quantitative Linguistics 8, 165 (2001).
 (39) R. Ferreri Cancho and M. S. Vitevitch, arXiv preprint arXiv:1801.00168 (2017).
 (40) E. G. Altmann, J. B. Pierrehumbert, and A. E. Motter, PLOS one 4, e7678 (2009).
 (41) M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas, graphs, and mathematical tables (Dover Publications, New York, USA, 1964), Vol. 55.
 (42) http://functions.wolfram.com/HypergeometricFunctions/.
 (43) N. Johnson, Continuous univariate distributions (Wiley, New York, USA, 1994).
Appendix A Estimation of normalised scale of database from the data
We estimate the normalised scale of database such as the total number of blogs by using the moving median as follows:

We create a set consisting of the indexes of words such that takes a value larger than the threshold .

We estimate as the median of with respect to .

For , we calculate using step 2.
Here, we use only words with
in step 1 because we neglect the discreteness. In step 2, we apply the median because of its robustness to outliers.
Appendix B Mean square displacement of powerlaw forgetting process
We calculate the MSD of the following powerlaw forgetting process given by Eq. D2,
(B1) 
where
(B2) 
and
(B3) 
is an arbitrary constant. The MSD can be calculated as
(B4)  
(B5)  
(B6) 
where
and
We calculate three terms in Eq. LABEL:S_2, respectively. The first term of Eq. LABEL:S_2 is given by
where is the Hurwitz zeta function , and is the digamma function, . The second term of Eq. LABEL:S_2 is given by
For , using the general formula for
(B13) 
and
(B14) 
we can obtain the approximation
and
.
Lastly, we calculate the third term of Eq. LABEL:S_2. Using the EulerMaclaurin formula abramowitz1964handbook ,
Comments
There are no comments yet.