Unraveling the dynamics of growth, aging and inflation for citations to scientific articles from specific research fields

08/18/2017
by   K. W. Higham, et al.
Victoria University of Wellington
0

We analyze the time evolution of citations acquired by articles from journals of the American Physical Society (PRA, PRB, PRC, PRD, PRE and PRL). The observed change over time in the number of papers published in each journal is considered an exogenously caused variation in citability that is accounted for by a normalization. The appropriately inflation-adjusted citation rates are found to be separable into a preferential-attachment-type growth kernel and a purely obsolescence-related (i.e., monotonously decreasing as a function of time since publication) aging function. Variations in the empirically extracted parameters of the growth kernels and aging functions associated with different journals point to research-field-specific characteristics of citation intensity and knowledge flow. Comparison with analogous results for the citation dynamics of technology-disaggregated cohorts of patents provides deeper insight into the basic principles of information propagation as indicated by citing behavior.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/06/2019

Mendeley Reader Counts for US Computer Science Conference Papers and Journal articles

Although bibliometrics are normally applied to journal articles when use...
06/05/2021

Exploring the Disproportion Between Scientific Productivity and Knowledge Amount

The pursuit of knowledge is the permanent goal of human beings. Scientif...
02/25/2015

Topic-adjusted visibility metric for scientific articles

Measuring the impact of scientific articles is important for evaluating ...
06/09/2021

Scientometric engineering: Exploring citation dynamics via arXiv eprints

Scholarly communications have been rapidly integrated into digitised and...
06/18/2021

Anomalous diffusion in the citation time series of scientific publications

We analyze the citation time-series of manuscripts in three different fi...
06/28/2017

Classic papers: déjà vu, a step further in the bibliometric exploitation of Google Scholar

After giving a brief overview of Eugene Garfield contributions to the is...
08/10/2020

The Role of Positive and Negative Citations in Scientific Evaluation

Quantifying the impact of scientific papers objectively is crucial for r...

1 Introduction

Peer-reviewed publications in scientific journals and patents issued by the national patent offices both serve to codify and document knowledge advances. To delineate clearly the reported scientific (technological) progress that has been achieved by the authors (inventors), citations to prior work (art) are necessary. While the detailed mechanisms and motivations governing the use of citations to scientific articles (Garfield & Sher, 1963; Garfield, 2006; Bornmann & Daniel, 2008) are generally different from those applying to patents (Jaffe et al., 2000; Hall et al., 2002; Cotropia et al., 2013; Jaffe & de Rassenfosse, 2017), all citing behavior is widely believed to be indicative of, at least some kind of, knowledge flow or information transfer. Furthermore, both for scientific articles and patents, citations are considered to be a (more or less noisy) proxy measure of impact (Griliches, 1990; Hall et al., 2005; von Wartburg et al., 2005; Garfield, 2006; Lane, 2010). This has motivated the quantitative study of citations, especially their distributions across suitably defined cohorts (Price, 1965; Seglen, 1992; Redner, 1998, 2005; Valverde et al., 2007; Radicchi et al., 2008; Stringer et al., 2010; Vieira & Gomes, 2010; Radicchi & Castellano, 2011; Waltman et al., 2012; Golosovsky, 2017; Sheridan & Onodera, 2017), as well as the dynamics of how citations are acquired over time (Price, 1976; Avramescu, 1979; Glänzel, 2004; Redner, 2005; Simkin & Roychowdhury, 2007; Csárdi et al., 2007; Valverde et al., 2007; Golosovsky & Solomon, 2012; Scharnhorst et al., 2012; Wang et al., 2013; Della Briotta Parolo et al., 2015; Colavizza & Franceschet, 2016; Pan et al., 2016; Golosovsky & Solomon, 2017; Higham et al., 2017; Yin & Wang, 2017). The ultimate goal of such investigation is the establishment of a basic generative model that captures the fundamental mechanisms governing citation dynamics and can thus reproduce the empirically observed time evolution and general statistical properties of citation accrual. Ideally, a properly validated model would be applicable to inform rational science and innovation policies (Lane, 2010).

Recent progress towards realistic, and potentially predictive, descriptions of citation dynamics (Redner, 2005; Csárdi et al., 2007; Valverde et al., 2007; Golosovsky & Solomon, 2012; Wang et al., 2013; Pan et al., 2016; Golosovsky & Solomon, 2017; Higham et al., 2017) has capitalized on advances in complex-network theory (Albert & Barabási, 2002; Dorogovtsev & Mendes, 2002; Newman, 2003). In particular, the concept of preferential attachment (PA) (Barabási & Albert, 1999; Dorogovtsev et al., 2000; Krapivsky & Redner, 2001) governing the rate at which citations are distributed has been very influential almost from the beginning (Price, 1976). However, the fruitful application of PA to understand citation behavior is predicated on the understanding of two other basic temporal influences: obsolescence and overall growth of research fields. Here we understand obsolescence to be reflected in the tendency for the citation rate to articles or patents to decay over time because of their reduced relevance for ongoing knowledge generation. Acting in parallel to the basic trend towards obsolescence, the overall growth of research fields provides another important mechanism that influences the rate at which citations are acquired. Empirical studies have observed a steady increase over time in the production of scientific articles (Price, 1965; Sinatra et al., 2015) and patents (Hall et al., 2002). As every article and patent will generally have to cite the knowledge stock that is current at the time of their creation, an increase in article and patent production will likely lead to an increase in the rate at which prior work is cited. The need for a careful disentangling of obsolescence and citation inflation due to growth was discussed early on, both for scientific articles (Egghe & Rousseau, 2000) and patents (Hall et al., 2002). The most widely adopted method to address growth consists of introducing normalization factors based on citation counts (Radicchi et al., 2008; Radicchi & Castellano, 2011; Wang et al., 2013; Yin & Wang, 2017), which is partly a result of the desire to find robust bibliometric impact measures for individual authors or institutions.

Here our motivation is different. We are interested in characterizing the intrinsic dynamics of knowledge generation and propagation that can be revealed by citation behavior if purely exogenous factors such as changes in article and patent productivity are appropriately accounted for. Our approach is inspired by its success in the context of patent-citation dynamics (Higham et al., 2017) and also a recent study (Šubelj & Fiala, 2017) where normalization by the number of articles published per year led to the observation of universal citation distributions for a large body of articles from physics and computer science, respectively. Furthermore, exponential growth was used as one ingredient in a successful network-model simulation of citations to scientific articles (Wang et al., 2013). Similar to our previous work on patents (Higham et al., 2017), we analyze citations within different research fields/subfields of physics as defined by the scope of individual journals published by the American Physical Society. The obtained journal-specific characteristics for the PA mechanism and obsolescence function are indicative of special features associated with knowledge generation and propagation in different physics-researcher communities.

2 Data, methods and results

The bibliometric and citation data set used in our work is provided by the American Physical Society (APS) and, in its entirety, consists of article metadata and citation pairs dating back to 1893 (American Physical Society, 2017). The subset of this data set that we focus on here are the cohorts of articles published in the year 2000111Our choice of this particular year constitutes a compromise between us capitalising on the increase over time in publication rates to maximise the article-cohort sizes while, at the same time, keeping a large-enough time window for articles to garner citations and facilitate the reliable observation of citation growth and obsolescence. in the research-field-specific APS journals Physical Review A, B, C, D, and E (from this point onwards abbreviated as PRA, PRB, etc.), as well as the APS’s multidisciplinary-physics letters journal Physical Review Letters (PRL). Citation rates are measured using all citation pairs whereby the cited article in one of these cohorts is linked to a citing article published in the years 2000–2015 in any APS journal. Table 1 provides an overview of the journal-specific article cohorts, with citation-number totals and other relevant citation-related statistical information. For all of the specialized journals (i.e., PRA–E), the fraction of citations originating from articles published in the same journal is quite high, justifying our approach to use these journals to be representative of different research fields. As expected, this is not the case for the multi-disciplinary letters journal PRL, which we include in our study as a benchmark for useful comparison.

Journal PRA PRB PRC PRD PRE PRL
Number of articles published in 2000 1,458 4,994 863 2,049 2,255 3,123
Total number of citations accrued by cohort 17,005 49,417 7,959 26,072 14,675 70,876
Fraction of journal self-citations 0.74 0.80 0.82 0.91 0.70 0.23
Total number of inflation-adjusted citations 12,927 44,401 7,389 19,776 11,423 59,083
Mean inflation-adjusted citations per article 8.87 8.89 8.56 9.65 5.07 18.92
Median inflation-adjusted citations per article 4.21 4.71 5.01 4.59 2.81 9.99
Table 1: Summary statistics for the APS-journal-article citation data set. Article cohorts comprise all articles published in a given journal in the year 2000. We analyse citations to these from other APS-journal articles published up until the end of the year 2015. The fraction of journal self-citations quantifies the number of citations originating from articles in the same journal where the cited article was published. In addition to listing the total number of citations accrued by each cohort, we also give the total of inflation-adjusted citations that is obtained by summing the citation counts that have been scaled to control for variations in citability due to changes in the numbers of articles published at different times. For reference, the mean and median numbers of inflation-adjusted citations per article are also provided for each APS-journal cohort.
Figure 1: Variation of inflation-adjusted citation values for individual APS journals over the 15-year period starting in 2000. We calculated the citation-value parameter defined in Eq. (1) for  January 2000 and  months. Hence, the value of a citation made in a journal article published at time corresponds to the ratio of the number of articles published in that journal in the first quarter of the year 2000 to the number of articles published in that same journal in the 3-month period starting at . To smooth short-term temporal fluctuations, we plot here the average , with denoting the Heaviside step function. Most journals exhibit a long-term trend of citation inflation due to the overall increasing rate at which articles are published. PRL and PRB are notable exceptions due to recent changes in their editorial policies (Meystre, 2013; Molenkamp, 2013).

To be able to separate the various mechanisms that together determine the rate at which citations are gained by scientific articles, we first devise a procedure to account for temporal variations in citeability arising from purely exogenous driving forces. Both the number of articles produced and the average number of citations made by each article vary over time (Radicchi & Castellano, 2011; Wang et al., 2013; Sinatra et al., 2015; Pan et al., 2016). In the long term, the combined effect of these factors causes a citation inflation that can mask the trend of obsolescence. In this work, we consider the changing rate at which articles are published an exogenous factor, as such changes in research productivity can be expected to be largely determined by the availability of resources, general policy decisions, or other influences that do not reflect the utility of prior knowledge. In contrast, any change in the average number of citations made by each article is indicative of the need to cite more or less of the currently relevant knowledge and, thus, is intrinsic to the information ecosystem. Based on this philosophy, we ‘deflate’ the value of incoming citations to each journal-specific cohort in each 3-month period such that the value of a citation to a particular article in a particular quarter is scaled by the ratio of the total number of articles published in the citing journal in the first quarter of the year 2000 to the total number of articles published in the same journal in the quarter in question. That is, if there were twice as many PRE articles published in the third quarter of the year 2010 as there were in the first quarter of the year 2000, then citations given by articles published in the former period would be given a value of 0.5 to reflect the higher chance of attaining a citation from that journal due to extrinsic growth in article-publication rates. Figure 1 illustrates the time evolution of the thus-defined value for citations originating from different APS journals , whose explicit mathematical expression is given here also for greater clarity;

(1)

where is the number of articles published in journal in the time interval .

After citations have been inflation-adjusted according to the procedure described above, all citations to articles in our cohort are assigned a time corresponding to the time lag between the publication of the cited article and that of the citing article. In order to model the dependence of the citation rate as a function of both time and the number of accrued citations, we bin citations by time of arrival where each bin has a range . We therefore observe two time series: the number of citations each article has accrued by time , denoted by , and the number of citations each article gains in the next period , denoted as . Our further analysis will be based on the assumption that the rate at which individual articles gain citations is a function of both and ,

(2)

As an empirical measure for , we use the average citation rate for the group of articles with citations in the interval , i.e., . In this manner, we obtain a matrix of citation rates for each journal with every entry corresponding to the average rate of citation to the group of articles published in that journal in the year 2000 with citations at time . We therefore place each article in a particular bin based on its accrued citations, , at the end of each time period. As described above, the binning in the time dimension is simply a linear scale where each bin has the width . In the dimension, logarithmic binning is used. This means each ‘-bin’ has the same width on a logarithmic scale, which turns out to be appropriate for the observed functional form for the dependence of

. In the implementation of this binning, we first introduce a threshold set at the 99th-percentile level of accrued citations by the end of the year 2015 (by journal). Once an article gains more than this number of citations, it is excluded from our measurements. This is done because there are not enough data in each bin above this threshold to measure citation rates accurately, and the large variance introduced by these data points would negatively affect the measurement of our model parameters.

Figure 2: Determining the time () and citation-number () dependences of the citation rate , using the average number of additional citations gained in the time interval , denoted by , as its empirical measure. Data shown here are for articles published in PRD in the year 2000. (a) Symbols show the time dependence of , with  months, for articles from bins with logarithmic-scale midpoints at and . The curves are fits of the aging function from Eq. (4a) to the data. (b) Symbols show the dependence of , with  year, on at fixed times  years and 12 years. Curves show fits of the PA expression from Eq. (4b) with to the empirical citation rate.

The simultaneous influences of knowledge-diffusion-driven growth and obsolescence-related decay on the citation rate can be captured by postulating the functional form (Dorogovtsev & Mendes, 2000; Zhu et al., 2003; Csárdi et al., 2007; Valverde et al., 2007; Golosovsky & Solomon, 2012)

(3)

where is a purely time-dependent aging function, and the growth kernel embodies the PA mechanism. That the rate at which individual articles gain citations is indeed of the separable form (3) is nontrivial and needs to be tested. To this end, we have fitted the observed and dependences of the citation rate for the article cohorts from a given journal and find that the observations are best described by the functional forms

(4a)
(4b)

Figure 2 shows examples of the performed fits222All fitting is completed using nonlinear logarithmic regressions from which parameters and their variances are determined.. A possible alternative form of is discussed in Appendix A.

If the separability of the citation rate into and -dependent factors according to Eq. (3) holds, then the empirically extracted values of and should be independent of the fixed times at which fits to Eq. (4b) have been performed. Likewise, fitted values of should not depend on . To determine whether this is the case, we fit for all possible values of and observe the measured values of in order to check for any systematic changes with time333In contrast to the previously considered case of technology-specific patent-citation data (Higham et al., 2017), the APS-journal-specific citation data are not large enough to enable an accurate measurement of . Motivated by our observation that is approximately unity for all journals, we henceforth fix . This allows for the accurate measurement of , whose exact value is much more relevant than that of in determining the structure and evolution of the citation network.. We then perform the same procedure with across all -value bins to check for any systematic changes in . As illustrated in Fig. 3, we indeed find that fitted values simply fluctuate around a stable mean, thus verifying empirically the separability of the citation rate in accordance with Eq. (3) and with the functional forms for the aging function and PA kernel given in (4a) and (4b), respectively. Table 2

lists the parameter values and their uncertainties that have been extracted for each individual APS-journal cohort as weighted arithmetic averages and their 95% confidence intervals from fitted values such as those shown in Fig. 

3 for PRD, where weights are the inverses of the variance for each fitted value. The average for does not include measurements for times less than years, as there is not enough spread in for accurate measurement of this parameter at small times. Results for are calculated from the fitted values for the product at fixed using the previously measured value of .

Figure 3: Demonstrating separability of the citation rate into a purely -dependent aging function and a -dependent PA growth kernel, as expressed in Eq. (3). Data shown pertain to articles published in PRD in the year 2000. (a) Values of the obsolescence time extracted from fits of the empirical citation rate to the functional form from Eq. (4a) for different fixed . (b) Values for the exponent derived from fits of the empirical citation rate to the form (4b) for the PA growth kernel , assuming . Circles are the fitted parameter values, solid lines indicate their weighted averages, and the black dashed (red dotted) curves show the 95% confidence intervals for fit-parameter values (their weighted averages).

To maximise the accuracy of fit-determined parameters, it turns out to be useful to adjust the logarithmic bin size in the dimension and the time interval in order to optimize the measurement resolution in the variable we are fitting. For example, when fitting for the groups of articles in particular fixed- bins, we can more accurately represent the data by slightly increasing the size of the bins such that more articles are included in each individual fitting procedure for at fixed , while at the same time decreasing for greater time resolution. Due to the finite range of empirically available values, this means there are fewer fixed- fitting procedures and thus fewer measurements of and ; however, we do not require a large number of measurements to detect any systematic change in these parameters with . The opposite is true when fitting . Based on such considerations, we have chosen  months when fitting and  year when fitting .

Year Journal PRA PRB PRC PRD PRE PRL
2000
[yrs]
[yrs]
1989-91
[yrs]
[yrs]
Table 2: Measured values for the parameters characterising preferential attachment () and obsolescence ( and ), extracted from analysing citations to articles published in APS journals in the year 2000. For comparison, results obtained from performing the same analysis on citations to articles published in these journals during the three-year period 1989–1991 are also given. Uncertainties represent 95% confidence intervals. For () values labeled with an (), the set of averaged values exhibited a weak residual dependence on time (number of accrued citations).

As is apparent from Fig. 2(a), the exponential form for given in Eq. (4a) turns out to provide a good fit to the data only for

 years. We have therefore limited our fitting of parameters for the aging function to this range. While other, generally more complicated, functional forms such as shifted power laws or stretched-exponential functions are able to be fitted over the whole range of time for our data, the validation of separability becomes extremely ambiguous with these three-parameter aging models, because the variances in the measured parameter values turn out to be very large relative to the magnitude of the parameters themselves. The full-range fit thus comes at the expense of meaningful parameter estimations. In contrast, the two-parameter model of exponential aging enables reliable parameter determination and accurately represents the data except for a short period after publication.

It would be very interesting to investigate systematically the quality of separability and determine the values for parameters in the aging function and the PA-growth kernel characterizing the citation dynamics of articles published in different years. However, cohorts of articles published much earlier than the year 2000 are generally smaller and have accrued fewer total citations, leading to larger statistical uncertainties. To improve the statistics and facilitate at least a glimpse of a basic comparison, we aggregated and analyzed the citation data for articles published in individual APS journals in the three-year period 1989–1991. Also for these earlier-published articles, citations obtained up until 2015 were included in our analysis. Thus the time range over which aging was observable for these articles is about ten years longer than for the year-2000 article cohorts, therefore some caution needs to be exercised in any direct comparison between the extracted obsolescence times for articles from the two time periods. The results obtained from fits to the PA growth kernel from Eq. (4b) with and aging function from Eq. (4a) with  years are also given in Table 2. Short-time deviations from exponential aging were found to persist over a longer initial time period for the article cohorts published during 1989–1990 than for the year-2000 cohorts, necessitating the larger value of . Because of this, and the systematically larger obsolescence time scale found for the earlier-published articles, the reliable extraction of required including all available data for citations acquired up until 2015, i.e., for 25 years after publication. Note that the observed longer period for deviations from exponential aging at short times for the earlier published articles is consistent with their larger values, as we have generally found these two time scales to be linked (Higham et al., 2017).

A citation rate of the separable form Eq. (3), with an aging function that can, in principle, depend also on the publication time , gives rise to a distribution function for citations to articles published at the same time that can be expressed most generally as (Higham et al., 2017)

(5)

Here

(6)

and encodes a fully general initial condition for the distribution. Figure 4 shows a comparison between empirical data for for the cohort of articles published in PRD in the year 2000 and the theoretical prediction obtained from Eq. (5) with the parameter values for aging and PA-driven growth given in Table 2 for this cohort. We used the expression

(7)

which is the result obtained from Eq. (6) with Eq. (4a) as the aging function, rescaled by the cohort-specific factor to account for the observed short-term deviations from exponential aging (Higham et al., 2017). To minimize the impact of having arbitrarily fixed in our fitting procedure, we took the empirically observed distribution of citations at year as . The agreement between theory and data is excellent, except for large at small where the influence of deviations from exponential aging at short times cannot be properly quantified by our model based on the parameter  (Higham et al., 2017).

Figure 4: Distribution function for citations to articles published in PRD in the year 2000, plotted as a function of the number of citations for fixed times years and years after publication. Symbols show the empirical data. Curves were calculated from the general expression Eq. (5) with parameters given in Table 2 for the cohort of articles published in PRD in the year 2000 and the empirically observed citation distribution at year taken as . The form (7) was used with . The disagreement between theory and data seen for large at small is due to the systematic deviation from exponential aging occuring at short times [see Fig. 2(a)].

Knowledge of in principle allows to also derive the distribution function for citations in the aggregated article network comprising cohorts with ;

(8)

Citation distributions for large collections of articles whose publication times span long time intervals have been the subject of intense recent interest (Redner, 2005; Stringer et al., 2010; Radicchi & Castellano, 2011; Waltman et al., 2012; Šubelj & Fiala, 2017; Yin & Wang, 2017; Sheridan & Onodera, 2017). One particular question such studies have aimed to answer is how the empirical distributions compare with the form of stationary citation distributions arising within the framework of relevant network models. Analyzing this issue for articles within separate research fields could be an interesting direction for future research (Stringer et al., 2010).

3 Discussion

Gaining a full understanding of the dynamics of citation accrual by scientific articles and patents has been hampered by the need to account for various closely entangled, and sometimes mutually counteracting, influences. On a basic qualitative level, it can be surmised that diffusion of knowledge drives the accumulation of citations by articles or patents that report useful new information (Jaffe & Trajtenberg, 1999; Bornmann & Daniel, 2008). The ability to accurately model, and potentially optimise, the dynamics of knowledge diffusion should be facilitated by the increased availability of high-quality citation data, if only the additional mechanisms affecting real-world citing behaviour could be reliably identified and accounted for. The results of our present study constitute a step in this direction.

One factor that is as intrinsic to the scientific (or invention) ecosystem as knowledge diffusion is the process of obsolescence, i.e., the tendency for previously codified information and methods to become less relevant over time for continued progress of knowledge generation. Conceptually, obsolescence can be understood as an aging process. However, both in the real world and in idealised network-model descriptions, any existing intrinsic aging dynamics can be masked by exogenous influences that effectively contribute to aging or counterbalance it. For example, in the generic network model applied to citing behaviour (Price, 1976; Barabási & Albert, 1999) where a new node is added at each time step and distributes a number of citations to existing nodes according to a PA mechanism, the linear-in-time growth of the overall network induces a purely structural aging process. The associated aging function is given by (Albert & Barabási, 2002) , where is the time at which the node has been added (corresponding to the publication time for an article or patent). In the real world, the rapid increase over time in the overall number of published articles has similarly been perceived as a structural cause of aging (Della Briotta Parolo et al., 2015). On the other hand, an increase in the rate of production for articles (patents) that cite relevant prior knowledge can boost the citation rate for older articles (patents). In fact, the combination of increased publication activity and an on-average increased number of citations made per article (patent) has been seen to cause a citation inflation that partially compensates aging effects (Hall et al., 2002; Wang et al., 2013; Pan et al., 2016).

Our present approach is designed to carefully disentangle the mechanism of PA-driven citation accumulation from the effects of aging and inflation. The litmus test for having achieved this goal is provided by the absence of residual time dependences in PA-related parameters, especially the exponent , that have been extracted from fits of the empirical citation rate. See Fig. 3(b). Furthermore, the parameters governing obsolescence-related aging should be found to be independent of the number of citations, as is indeed the case [Fig. 3(a)]. To be able to demonstrate this clear separation, we needed to focus our analysis on specific research fields in physics, as defined by the scope of individual APS journals, and account for citation inflation by the journal-specific scaling factor . In contrast, previous studies that did not separate articles by research fields and did not account for citation inflation found significant monotonous increases over time in the extracted values of  (Golosovsky & Solomon, 2012). We observe some deviations from full separability of the empirical citation rate into independent aging and PA parts [Eq. (3)] for the cases of PRB (see results presented in Appendix B) and PRL, which are both journals that publish articles from a much broader and more heterogeneous range of physics subfields, and even from neighboring disciplines such as chemistry and mathematics, than the other four journals. Our results suggest that the dynamics of knowledge diffusion and intrinsic obsolescence of knowledge is research-field-specific.

We identify the obsolescence-related aging function to be an exponential function of time since publication in the long term, as expressed in Eq. (4a). This observation broadly agrees with recent studies of larger scientific-article cohorts (Della Briotta Parolo et al., 2015; Pan et al., 2016), although we find a slightly shorter value for the obsolescence time scale in our research-field-specific analysis and also observe some variation of this parameter between the different fields. In particular, the subfields associated with PRC and PRD appear to be more slowly changing than those covered by the other three specialised journals that turn out to have a similarly short obsolescence time as the multidisciplinary journal PRL. While not fully conclusive because of the overall scale of uncertainties in the extracted values, this observation is consistent with expectations based on known characteristics of the research fields covered by PRC and PRD, especially a dependence on the long-term development of large-scale equipment run by very large consortia of researchers.

Deviations from exponential-in-time aging occur at short times  years [see Fig. 2(a)], with more citations getting accumulated per unit time than expected from an extrapolation of the long-term exponential-aging trend. A similar excess of citations arriving in a short time period after publications was also observed for patents (Higham et al., 2017). The apparent universality of this short-term enhancement suggests the existence of a common origin related to knowledge-flow dynamics, which should be clarified by systematic further studies of the citation-number dependence of the excess number of citations (generally deviations from exponential aging are found to be smaller in cohorts of more highly cited articles or patents), as well as possible systematic variations across different research fields or technology categories. Interestingly, the relative magnitude of the short-term deviation from exponential aging exhibited by low to medium-cited scientific-article cohorts is consistently observed to be larger, by at least a factor of two, than for the comparable cohorts of patents.

The fits of the PA-mechanism growth kernel given in Eq. (4b) with fixed to the data yield values of the exponent that vary moderately across the different APS-journal cohorts and are generally consistent with a superlinear dependence on (). Due to the relatively small data set as compared, e.g., with our previous patent-citation study (Higham et al., 2017), the functional form of Eq. (4b) yields only marginally better fits than the alternative form given in Eq. (9) that has also been applied previously to scientific-article citation data (Golosovsky & Solomon, 2012) but was ruled out for patents (Higham et al., 2017). See the more detailed discussion in Appendix A and results presented in Fig. 5 and Table 3. The values for obtained using the two alternative PA-kernel forms are essentially the same.

The aggregation of citation data for articles published in individual APS journals over the three-year period 1989–1991 made them amenable to the same type of analysis performed on the year-2000 article cohorts. Interestingly, we found the same qualitative features and even some of the same quantitative results for the citation dynamics of the earlier-published articles as for those from the year 2000. In particular, PA-driven growth of citations is clearly and robustly exhibited, with values for the exponent essentially the same as for the respective year-2000 APS-journal cohorts. Detailed results are given in Table 2. Aging was again found to be described by an exponential function of time in the long run also for the cohorts of articles published during 1989–1991, but with prominent deviations from exponential aging at short times that persist over a longer initial time period (4 years) than for the cohorts of articles from 2000. As a result, reliable extraction of the obsolescence time associated with the long-term exponential-aging behavior required to fit all the available data upto the end of the 25-year citation history of the 1989–1991 articles. Their thus-obtained values for are consistently larger, by about  years, than the values we found for the articles published in the same journal in 2000 using the data from their shorter (only 15-year-long) citation history. At the same time, the obsolescence times of the older articles show the same trends for variation across journals (i.e., PRC and PRD again appearing to be associated with more slowly changing subfields). The juxtaposition of parameters describing PA and aging of citations to articles published ten years apart already provides an interesting snapshot of temporal variations in journal-article citation dynamics. There is scope to perform similarly suitable aggregations over multi-year periods to analyze even older APS-journal-article cohorts. Alternatively, larger-scale studies of potential time variations of citation-dynamics parameters within different physics subfields would require going beyond the APS-journal data set, thus creating the need to define association with a subfield either based on a broadly adopted classification scheme such as PACS or via a more fine-grained version of a previously employed topic-specific analysis of citation patters (Sinatra et al., 2015).

4 Conclusions

We have analyzed the time evolution of citation data for articles published in six different APS journals in 2000 to gain insight into research-field-specific characteristics of knowledge-flow dynamics. Unlike previous studies, we have accounted for citation inflation arising from temporal variations in the rate of publication of articles in the individual journals by a normalization factor. We demonstrate separability of the empirical citation rate for most journal-article cohorts into a purely citation-number-dependent part that reflects a preferential-attachment-driven growth mechanism and a purely time-dependent aging part that is an exponential function of time in the long term. Deviations from full separability that are observed for PRB and, to a lesser extent, PRL are smaller in magnitude than in previous studies where articles were not separated by research fields (Golosovsky & Solomon, 2012), suggesting that such deviations are likely caused by the underlying heterogeneity of scientific communities publishing in these two journals.

As exogenously caused variations in citability (‘inflation’) are accounted for within our approach, the observed characteristics of the aging function should be dominated by the dynamics of obsolescence for knowledge within the specific research fields. This provides a window into the dynamics of scientific progress within these fields, as the time scale for obsolescence is commonly associated with the rate at which the knowledge frontier advances. In particular, we were able to identify more slowly changing research fields (those associated with PRC and PRD) compared to the rest that have a similar obsolescence time as the multidisciplinary journal PRL. Furthermore, application of our analysis to cohorts of APS-journal articles published in the period 1989-1991 revealed their obsolescence times to be about 2 years longer compared to the year-2000 cohorts from the same journal, while following the same basic trend in the variation across different journal cohorts. However, the fact that deviations from the well-understood exponential obsolescence behavior occur over a longer initial time period for the articles published during 1989-1991 — requiring us to include the data from the period 15-25 years after their publication to reliably extract — makes a direct comparison with the articles published in 2000 somewhat difficult, as no citation data beyond 15 years after publication are available for the latter. More detailed analysis is needed to exclude potential other systematic influences that affect the obsolescence of articles 15-25 years after their publication. If the observation of faster obsolescence for the younger article cohorts were to be robustly substantiated, then it could signify an overall accelerating pace at which science advances in all subfields of physics. Such an acceleration might, for example, be rooted in changing patterns of exploration (focus on broad and general versus deep and specific), but evolving citation practices or reading habits can also affect the obsolescence time scale (Egghe & Rousseau, 2000).

Although the obsolescence-induced aging is accurately described by an exponential function for intermediate to long times after publication, an excess of citations above the extrapolated exponential behavior occurs within the first 2-3 (4-5) years after publication for the articles from 2000 (1989–1991), with stronger deviations occurring for the article cohorts with smaller number of accrued citations. More systematic study is needed to determine the origin of these deviations, but their particular features point to the existence of a special knowledge-propagation mechanism that is effective at short times.

Even though the mechanisms and motivations determining citing behavior of academics and inventors have been identified to be quite different (Bornmann & Daniel, 2008; Cotropia et al., 2013; Jaffe & de Rassenfosse, 2017), the results of our present study turn out to be strikingly similar, both qualitatively and quantitatively, to those found previously in an analysis of the inflation-adjusted citation dynamics for patents granted in 1998 within specific technology categories (Higham et al., 2017). In particular, the values of the exponent characterizing preferential-attachment-type growth vary over the same basic range of magnitude between the different article and patent cohorts. The obsolescence time is observed to be just slightly longer for the patents compared to that of the APS-journal articles, whereas the magnitude of the citation-rate enhancement over the extrapolated exponential-aging behavior in the short term is systematically larger for the cohorts of articles than for the comparable patent cohorts. Our ability to provide a more detailed comparison between the citations dynamics of patents and scientific articles is hampered by the fact that the accuracy of parameter values extracted in the present study was more limited due to the smaller size (by roughly an order of magnitude both in total numbers of accrued citations and in total numbers of citable items) of the article cohorts in comparison with the patent cohorts. Further studies of research-field specific trends in article-citation dynamics will need to utilize larger data sets that have reliably tagged outputs from different research fields. Demonstrating the separability of the appropriately inflation-adjusted empirical citation rate for these larger cohorts into a purely citation-number-dependent growth part and an obsolescence-induced aging part will be a crucial first step to obtain, and meaningfully compare, relevant parameters.

Questions to be addressed by future studies include the relevance of memory effects in the citation dynamics that can cause deviations from preferential-attachment-type growth. This aim will also require use of larger data sets as, e.g., autocorrelations were previously observed to be significant only for the very highly cited ones among scientific articles (Golosovsky & Solomon, 2012). We do not expect such effects to be relevant for our present analysis where articles were excluded once their citation count reached the 99th percentile for their respective journal-article cohort. Another interesting issue that could be explored concerns the functional form of the steady-state distribution of citations to articles within a given specialized research field (Stringer et al., 2010). Whether and how citation inflation is accounted for may crucially influence the observed properties of such distributions (Radicchi & Castellano, 2011; Waltman et al., 2012; Yin & Wang, 2017; Šubelj & Fiala, 2017). Further systematic investigation of this question could inform ongoing discussions about the consistency of PA-driven growth models with empirically observed static properties of citation networks (Golosovsky, 2017; Sheridan & Onodera, 2017).

Appendix Appendix A Results from fitting an alternative functional form for the preferential-attachment kernel

Journal PRA PRB PRC PRD PRE PRL
[years]
Table 3: Measured values for , and those derived for , obtained from fitting the functional form for the PA kernel to the citation data for articles published in APS journals in the year 2000. For the values labeled with an , the set of averaged values exhibited a weak residual dependence on time. Note the only slightly larger uncertainties for the values in comparison with results given in Table 2.

A number of functional forms have been utilised to characterise superlinear preferential attachment, most of which converge to in the large- limit. We have adopted one such form, given in Eq. (4b), to model preferential attachment in this work. An alternate form of preferential attachment incorporates a constant shift into the argument,

(9)

We have fitted also this functional form to the empirical citation rates in order to test whether this provides a better fit, using fixed in analogy to our methodology in the main body of this work. The results obtained from the alternative-fit analysis are summarised in Table 3 and Fig. 5. While uncertainties in the measured values of for the various journals turn out to be slightly larger than the values obtained by fitting to Eq. (4b) (see results presented in Table 2), there is little quantitative difference between the measurements of from the two models.

Figure 5: Exponent extracted from fitting a PA growth kernel of the form to the empirical citation rate for articles published in PRD in 2000. Circles are the fit values, the solid line is their weighted average, and black dashed (red dotted) curves indicate 95% confidence intervals for the fit values (the weighted average). Note the extremely small differences with the results shown in Fig. 2(b).

Appendix Appendix B Observed deviations from full separability of the empirical citation rate: Case of PRB

Figure 6: Illustration of how deviations from separability of the empirical citation rate are manifested. Data shown pertain to articles published in PRB in the year 2000. Circles are the fitted parameter values, solid lines indicate their weighted averages, and the black dashed (red dotted) curves show the 95% confidence intervals for fit-parameter values (their weighted averages). (a) Values of the obsolescence time extracted from fits of the empirical citation rate to the functional form from Eq. (4a) for different fixed . In contrast to the case shown in Fig. 3(a), a systematic trend for values to increase as a function of is exhibited here. (b) Values for the exponent derived from fits of the empirical citation rate to the form (4b) for the PA growth kernel , assuming . In comparison with Fig. 3(b), shows a systematic dependence on .

The separation of the empirical citation rate into independent factors describing long-term exponential aging and PA-driven citation accumulation, as expressed in Eq. (3), was demonstrated for (most of) the APS-journal-article cohorts by observing relevant fit parameters for the aging function (the PA growth kernel) to be independent of the variable (). The general quality of the demonstrated separation is illustrated in Fig. 3 using the data for articles published in PRD. A deviation from separability was observed for PRL where the extracted values for the exponent exhibited a trend to increase as a function of , varying between 1.07 and 1.17 over our study’s 15-year period. However, the most drastic violation of separability occurred for PRB where both and showed residual dependences on and , respectively.

Figure 6 shows the results for PRB. The observed trend of increasing as a function of is slower than, but still of roughly the same order of magnitude as, in studies where articles were not disaggregated by research field and inflation was not accounted for (Golosovsky & Solomon, 2012). The increasing advantage of more highly cited articles to attract further citations at a higher rate could reflect the greater importance of autocorrelations in the citation dynamics of PRB and PRL articles. Alternatively, a greater heterogeneity in terms of research field and stronger multidisciplinary influences from fields outside physics that characterize both PRB and PRL could be the cause. Support for this conclusion is also provided by the results of a related patent-citation study (Higham et al., 2017) where deviations from separability of the empirical citation rate also occurred for the more heterogeneous technology categories.

References

  • Albert & Barabási (2002) Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.
  • American Physical Society (2017) American Physical Society (2017). APS data sets for research. URL: http://journals.aps.org/datasets.
  • Avramescu (1979) Avramescu, A. (1979). Actuality and obsolescence of scientific literature. Journal of the American Society for Information Science, 30, 296–303.
  • Barabási & Albert (1999) Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
  • Bornmann & Daniel (2008) Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64, 45–80.
  • Colavizza & Franceschet (2016) Colavizza, G., & Franceschet, M. (2016). Clustering citation histories in the Physical Review. Journal of Informetrics, 10, 1037–1051.
  • Cotropia et al. (2013) Cotropia, C. A., Lemley, M. A., & Sampat, B. (2013). Do applicant patent citations matter? Research Policy, 42, 844–854.
  • Csárdi et al. (2007) Csárdi, G., Strandburg, K. J., Zalányi, L., Tobochnik, J., & Érdi, P. (2007). Modeling innovation by a kinetic description of the patent citation system. Physica A, 374, 783–793.
  • Della Briotta Parolo et al. (2015) Della Briotta Parolo, P., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., & Fortunato, S. (2015). Attention decay in science. Journal of Informetrics, 9, 734–745.
  • Dorogovtsev & Mendes (2000) Dorogovtsev, S. N., & Mendes, J. F. F. (2000). Evolution of networks with aging of sites. Physical Review E, 62, 1842–1845.
  • Dorogovtsev & Mendes (2002) Dorogovtsev, S. N., & Mendes, J. F. F. (2002). Evolution of networks. Advances in Physics, 51, 1079–1187.
  • Dorogovtsev et al. (2000) Dorogovtsev, S. N., Mendes, J. F. F., & Samukhin, A. N. (2000). Structure of growing networks with preferential linking. Physical Review Letters, 85, 4633–4636.
  • Egghe & Rousseau (2000) Egghe, L., & Rousseau, R. (2000). Aging, obsolescence, impact, growth, and utilization: Definitions and relations. Journal of the American Society for Information Science, 51, 1004–1017.
  • Garfield (2006) Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association, 295, 90–93.
  • Garfield & Sher (1963) Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation, 14, 195–201.
  • Glänzel (2004) Glänzel, W. (2004). Towards a model for diachronous and synchronous citation analyses. Scientometrics, 60, 511–522.
  • Golosovsky (2017) Golosovsky, M. (2017). Power-law citation distributions are not scale-free. Physical Review E, 96, 032306.
  • Golosovsky & Solomon (2012) Golosovsky, M., & Solomon, S. (2012). Stochastic dynamical model of a growing citation network based on a self-exciting point process. Physical Review Letters, 109, 098701.
  • Golosovsky & Solomon (2017) Golosovsky, M., & Solomon, S. (2017). Growing complex network of citations of scientific papers: Modeling and measurements. Physical Review E, 95, 012324.
  • Griliches (1990) Griliches, Z. (1990). Patent statistics as economic indicators: A survey. Journal of Economic Literature, 28, 1661–1707.
  • Hall et al. (2005) Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. The RAND Journal of Economics, 36, 16–38.
  • Hall et al. (2002) Hall, B. H., Jaffe, A. B., & Trajtenberg, M. (2002). The NBER patent citations data file: Lessons, insights and methodological tools. In A. B. Jaffe, & M. Trajtenberg (Eds.), Patents, Citations, and Innovations: A Window on the Knowledge Economy (p. 403). Cambridge, MA: MIT Press.
  • Higham et al. (2017) Higham, K. W., Governale, M., Jaffe, A. B., & Zülicke, U. (2017). Fame and obsolescence: Disentangling growth and aging dynamics of patent citations. Physical Review E, 95, 042309.
  • Jaffe & de Rassenfosse (2017) Jaffe, A. B., & de Rassenfosse, G. (2017). Patent citation data in social science research: Overview and best practices. Journal of the Association for Information Science and Technology, 68, 1360–1374.
  • Jaffe & Trajtenberg (1999) Jaffe, A. B., & Trajtenberg, M. (1999). International knowledge flows: Evidence from patent citations. Economics of Innovation and New Technology, 8, 105–136.
  • Jaffe et al. (2000) Jaffe, A. B., Trajtenberg, M., & Fogarty, M. S. (2000). Knowledge spillovers and patent citations: Evidence from a survey of inventors. American Economic Review, 90, 215–218.
  • Krapivsky & Redner (2001) Krapivsky, P. L., & Redner, S. (2001). Organization of growing random networks. Physical Review E, 63, 066123.
  • Lane (2010) Lane, J. (2010). Let’s make science metrics more scientific. Nature, 464, 488–489.
  • Meystre (2013) Meystre, P. (2013). Editorial: Review changes. Physical Review Letters, 111, 180001.
  • Molenkamp (2013) Molenkamp, L. W. (2013). Editorial: Scope and standards of PRB. Physical Review B, 87, 170001.
  • Newman (2003) Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.
  • Pan et al. (2016) Pan, R. K., Petersen, A. M., Pammolli, F., & Fortunato, S. (2016). The memory of science: Inflation, myopia, and the knowledge network. preprint arXiv:1607.05606.
  • Price (1976) Price, D. de Solla (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27, 292–306.
  • Price (1965) Price, D. J. de Solla (1965). Networks of scientific papers. Science, 149, 510–515.
  • Radicchi & Castellano (2011) Radicchi, F., & Castellano, C. (2011). Rescaling citations of publications in physics. Physical Review E, 83, 046116.
  • Radicchi et al. (2008) Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105, 17268–17272.
  • Redner (1998) Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. European Physical Journal B, 4, 131–134.
  • Redner (2005) Redner, S. (2005). Citation statistics from 110 years of Physical Review. Physics Today, 58, 49–54.
  • Scharnhorst et al. (2012) Scharnhorst, A., Börner, K., & van den Besselaar, P. (Eds.) (2012). Models of Science Dynamics. Berlin: Springer.
  • Seglen (1992) Seglen, P. O. (1992).

    The skewness of science.

    Journal of the American Society for Information Science, 43, 628–638.
  • Sheridan & Onodera (2017) Sheridan, P., & Onodera, T. (2017). A preferential attachment paradox: How does preferential attachment combine with growth to produce networks with log-normal in-degree distributions? preprint arXiv:1703.06645.
  • Simkin & Roychowdhury (2007) Simkin, M. V., & Roychowdhury, V. P. (2007). A mathematical theory of citing. Journal of the American Society for Information Science and Technology, 58, 1661–1673.
  • Sinatra et al. (2015) Sinatra, R., Deville, P., Szell, M., Wang, D., & Barabási, A.-L. (2015). A century of physics. Nature Physics, 11, 791–796.
  • Stringer et al. (2010) Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2010). Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. Journal of the American Society for Information Science and Technology, 61, 1377–1385.
  • Šubelj & Fiala (2017) Šubelj, L., & Fiala, D. (2017). Publication boost in web of science journals and its effect on citation distributions. Journal of the Association for Information Science and Technology, 68, 1018–1023.
  • Valverde et al. (2007) Valverde, S., Solé, R. V., Bedau, M. A., & Packard, N. (2007). Topology and evolution of technology innovation networks. Physical Review E, 76, 056118.
  • Vieira & Gomes (2010) Vieira, E., & Gomes, J. (2010). Citations to scientific articles: Its distribution and dependence on the article features. Journal of Informetrics, 4, 1–13.
  • von Wartburg et al. (2005) von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 34, 1591–1607.
  • Waltman et al. (2012) Waltman, L., van Eck, N. J., & van Raan, A. F. J. (2012). Universality of citation distributions revisited. Journal of the American Society for Information Science and Technology, 63, 72–77.
  • Wang et al. (2013) Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342, 127–132.
  • Yin & Wang (2017) Yin, Y., & Wang, D. (2017). The time dimension of science: Connecting the past to the future. Journal of Informetrics, 11, 608–621.
  • Zhu et al. (2003) Zhu, H., Wang, X., & Zhu, J.-Y. (2003). Effect of aging on network structure. Physical Review E, 68, 056121.