Exploring the Role of Interdisciplinarity in Physics: Success, Talent and Luck

01/11/2019 ∙ by Alessandro Pluchino, et al. ∙ 0

Although interdisciplinarity is often touted as a necessity for modern research, the evidence on the relative impact of sectorial versus to interdisciplinary science is qualitative at best. In this paper we leverage the bibliographic data set of the American Physical Society to quantify the role of interdisciplinarity in physics, and that of talent and luck in achieving success in scientific careers. We analyze a period of 30 years (1980-2009) tagging papers and their authors by means of the Physics and Astronomy Classification Scheme (PACS), to show that some degree of interdisciplinarity is quite helpful to reach success, measured as a proxy of either the number of articles or the citations score. We also propose an agent-based model of the publication-reputation-citation dynamics reproduces the trends observed in the APS data set. On the one hand, the results highlight the crucial role of randomness and serendipity in real scientific research; on the other, they shed light on a counter-intuitive effect indicating that the most talented authors are not necessarily the most successful ones.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 8

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Role of Interdisciplinarity in the APS data set

The APS data set is a corpus of articles published in Physical Review Letters, Physical Review and Reviews of Modern Physics, and dates back to 1893 (see Supplementary Information S1 for more details about this section). In particular, data about of all citing-cited pairs of articles in which one paper cites another within the collection, basic metadata and sub-disciplinary classifications about each article in the collection, are present. Data in APS data set were preprocessed and cleaned to avoid duplicated authors and affiliations by using Jaccard similarity in connection to Locality Sensitive Hashing. For the aims of our analysis we will only consider the period and the articles of the authors who published their first paper between and and at least three papers between to . Their total scientific production in this interval of years consists of

PACS classified articles, which received a total of

citations from the other articles in the data set. The Physics and Astronomy Classification Scheme (PACS) is a hierarchical partitioning of the whole spectrum of subject matter in physics, astronomy, and related sciences introduced since . We limit our attention to the 10 most general PACS classes, each corresponding to a broad disciplinary field. Information about classes present in each article are available in the APS data set.

Figure 1: APS data set. Distributions of the total number of papers published, during their entire careers, by the authors of the three groups with increasing levels of interdisciplinarity. A power-law curve with slope equal to is also reported for comparison (dashed line).

For a given author (), the total number of different PACS classes appearing in her publications during her entire career could be certainly considered as a global indicator of the multidisciplinarity of her work. However, as the classes do not appear simultaneously in all the papers of , it is also interesting to consider the average number of classes simultaneously present in her articles. From the APS data it results that . Multiplying these two factors, we finally obtain the index

(1)

which we propose as a more robust indicator of interdisciplinarity.

Accordingly, we can divide the authors into three groups, with comparable sizes and with an increasing level of interdisciplinarity:

- Level 1 group : authors with (low interdisciplinarity level);

- Level 2 group : authors with (medium interdisciplinarity level);

- Level 3 group : authors with (high interdisciplinarity level);

The first goal of this study is to investigate if these different degrees of interdisciplinarity are correlated to the scientific impact of the active researchers of the APS data set, evaluated through both the number of papers and the citations cumulated during their careers.

Figure 2: APS data set. Distributions of the total number of citations cumulated, during their entire careers, by the authors of the three groups with increasing levels of interdisciplinarity. A power-law curve with slope -1.8 is also reported for comparison (dashed line).

Fig.1 shows the distributions of the total number of papers published by the authors belonging to the three groups , and , which are plotted in three different colors (red, green and blue, respectively). The interdisciplinarity level has a strong positive influence on the productivity of the authors, since authors that have a higher level of interdisciplinarity are more productive. In addition, the tails of the three distributions follow a power-law behavior, with a slope equal to (dashed line). A similar behavior is visible in Fig.2, where the distributions of the total number of citations received by the authors of the three groups during their entire careers is plotted with the same colors. Also in this case, the interdisciplinarity level seems to play an important role in affecting the scientific success of the researchers. Again, the tails of the three distributions follow a power-law behavior, but here with a different slope equal to .

 Authors  Papers  Citations
2445 18832 7.70 230448 94.25
2511 35892 14.29 515635 205.35
2347 50947 21.71 843292 359.31
Table 1: Some characteristic numbers of our APS sample regarding the 89949 published papers and their 1329374 citations over the three interdisciplinarity groups. Here = average papers per author and = average citations per author. A paper is counted in more than one class if it is coauthored by researchers belonging to different classes, so the sums of the number of papers and of the citations exceed, respectively, 89949 and 1329374.
Figure 3: APS data set. The fraction of researchers who have collected, during their careers, a number of citations greater than an increasing threshold is reported as function of . The three curves, corresponding to different interdisciplinarity levels, lie inside a range limited by two q-exponential functions with the same value for the entropic index and different values of as reported.

In Table 1 we report the total number of papers and citations for each interdisciplinarity group, together with the corresponding averages per author ( and , respectively). The results confirm the hypothesis that is able to capture a real interesting effect encoded in the APS data set, i.e. the beneficial role of interdisciplinarity in enhancing both the productivity and the scientific impact of the examined authors.

In order to better appreciate the differences between the three interdisciplinarity groups from the point of view of their citations scores, it is convenient to take into account, for each group, the fraction of authors who have collected, during their whole careers, a number of citations greater than an increasing threshold . In Fig.3 such a cumulative citation distribution is plotted as function of the threshold , for the three interdisciplinarity groups. The fraction of highly interdisciplinary researchers (level 3) is generally greater that the fraction of those of level 2 and level 1, in particular for threshold values less than . Above this value, a tendency towards a mixing of the three groups is visible; in any case, authors of level 3 stay always above those of level 1. It is also interesting to notice that the three curves lie inside a range limited by two q-exponential functions Tsallis with the same value for the entropic index and different values of , see Fig.3.

Figure 4: A simplified depiction of the initial state of the agent-based model. For clarity, the figure represents only individuals but the simulations considered the whole cohort of researchers active in the period of years taken into account in the analysis of the APS data set. The ’world’ is a 2D box with periodic boundary conditions.

The Agent-Based Model

Next, we turned to the development of an agent-based model able to reproduce, under constraints based on APS real data, the publication-reputation-citation dynamics which generates the observed behavior of our cohort of scholars (see Supplementary Information S2 for more details about this section).

We start by choosing the initial setup of the model in order to take into account some of the real features of the authors considered in the APS data set. In Fig.4 we show the 2D model world where the APS authors (agents depicted as silhouettes) are randomly assigned a position, fixed during the simulations. Each simulation has a duration of years, with a time step of year. The agents are divided into the three groups (in red), (in green) and (in blue), with sizes , and respectively, according with the value of their real interdisciplinary index . Each agent is further characterized by the following other variables:

- a fixed talent

(intelligence, skill, endurance, hard-working, …), which is a real number randomly extracted at the beginning of the simulation from a Gaussian distribution with mean

and standard deviation

;

- an array , whose elements are the papers published by the author at time ; the size of the array will give the total number of published papers at that time;

- an array , whose elements are the citations received by each paper present in at time (thus, and have the same size); will be the total number of citations received by all these papers at that time;

- an array whose elements are the reputation levels reached by the author in each one of the PACS classes; these reputation levels are real numbers, included in the interval , which increase as function of the number of papers published in the corresponding disciplinary fields.

The virtual world also contains event-points which, unlike the authors, randomly move around during the simulations. They are colored with different shades of magenta, one for each of the PACS classes, and the relative abundance of points belonging to a given class is fixed in agreement with the information of the APS data set (also their total number was calibrated on the real data). Events represent random opportunities, ideas, encounters, intuitions, serendipitous events, etc., which can periodically occur to a given individual along her career, triggering a research line along one or more fields represented by the corresponding PACS class. In this respect, each author owns a ”sensitivity circle” representing the spatial extension of their sensitivity to the event-points. The radius of these circles is different for the three interdisciplinarity groups ,, and can be determined through a calibration with real data. Each author, depending on her own group, is sensitive only to the event-points corresponding to the ”fertile” PACS which are present in her array (we will define these points as ”special” points for that author).

The publication-reputation-citation dynamics of this community of authors is quite simple.

- Every year, a check is performed over all the authors in order to verify what type and how many special event-points fall inside their ”sensitivity circles”. If, for a given author at year , it results that special points are in her circle, the researcher’s talent is compared with a random real number .

- If

(i.e. with probability equal to her talent) the number

of her published papers becomes equal to

(2)

where is an integer quantity, randomly extracted from a Gaussian distribution with mean and standard deviation . The increment is thus proportional to the number of papers already published by during the previous year (the coefficients and , equal for all the agents, are fixed through a calibration with the APS data).

- All the newly published papers will be added to the array and each of them will be characterized by the PACS classes corresponding to the event-points that fell in the sensitivity circle at year . Thanks to these new publications, author also increases her reputation in each of the disciplinary fields corresponding to their PACS (i.e. the corresponding elements of the array will be updated).

- Finally, based on the total number of published papers at time and on her reputation reached at that time, author yearly updates the elements () of her citations array with the following rule:

(3)

In other words, her -th paper will gain citations, at time , depending on both its previous citation score and the average reputation of the author in the disciplinary fields corresponding to the PACS present in the paper. Therefore, the overall increase in citations for the author at time will be

(4)

At the end of the simulation, i.e. for , a generic author will have cumulated a certain number of papers and a certain number of citations, depending on her ability in exploiting the opportunities offered by the random occurrence of event-points within her sensitivity circle. Since this ability is parameterized by the talent, the final success of the researchers - in terms of published papers and cumulated citations - will be influenced by both talent and luck (serendipitous events).

Figure 5: Model Simulation. Distributions of the total number of papers published, during their simulated careers, by the authors of the three groups with increasing levels of interdisciplinarity. A power-law curve with slope -2.3 is also reported for comparison (dashed line).

Numerical results

In this section we are interested in verifying if our model is able to capture the stylized facts already observed in the APS data set, with particular regard to the role of interdisciplinarity. Before going on, it is important to note that the agent-based model allows one to average the number of PACS present in each of the publications of a given author at the end of a simulation, enabling the calculation of the dynamical counterpart of the real parameter , i.e. the new parameter . By multiplying this parameter for the real , it is possible to update the interdisciplinarity index of Eq.1 - assigned at the beginning of the simulation on the basis of the real APS data - therefore obtaining the new (simulated) index

(5)

This index, which quantifies the effective interdisciplinarity level reached by each author at the end of a simulation, will allow, in turn, to update also the membership of the authors to the three interdisciplinarity groups, which now become , or .

As function of these groups, i.e. of the three corresponding interdisciplinarity levels, we plot in Fig.5 the distributions of the total number of papers published by the authors at the end of a typical simulation. We adopt the same colors as in Fig.1: red, green and blue, respectively. It is evident from the plot, that the proposed model is able to reproduce the same kind of behavior observed for the real APS data set: again, the degree of interdisciplinarity seem to have a strong positive correlation with the productivity of the authors and also the tails of the three distributions follow a power-law trend with the same slope of found for the APS data set.

An analogous agreement with the APS data can be observed in Fig.6, where we show the distributions of the total number of citations cumulated by the authors of the three groups during the simulation of their careers. Also in this case, the scientific impact seems strictly correlated with the interdisciplinarity propensity of the researchers. Moreover, the tails of the three distributions follow the same power-law behavior observed for the APS data (see Fig.2), with a slope of .

Figure 6: Model Simulation. Distributions of the total number of citations cumulated, during their simulated careers, by the authors of the three groups with increasing levels of interdisciplinarity. A power-law curve with slope -1.8 is also reported for comparison (dashed line).
Figure 7: Model Simulation. The fraction of researchers who have collected, during their simulated careers, a number of citations greater than an increasing threshold is reported as function of . The three curves, corresponding to different interdisciplinarity classes, lie inside a range limited by two q-exponential functions with the same value for the entropic index and different values of .
Figure 8: The total number of citations of the researchers of the three interdisciplinarity levels as function of the total number of papers they published in their careers. The figures show that the agent-based model numerical simulations are able to reproduce the positive correlation between these two quantities in all the three groups.

It is also interesting to plot, as in the previous section, the fraction of authors of the three interdisciplinarity groups who have collected, during their whole simulated careers, a number of citations greater than an increasing threshold . In Fig.7 it results that the fraction of highly interdisciplinary researchers (level 3) is greater than the fraction of those of level 2 for all values of , and the latter is - in turn - always greater than the fraction of level 1 authors. A mixing of the three groups is visible only for very high values of the citation score. As observed in Fig.3, also in this case the three curves fall inside a range limited by two q-exponential functions with a value of the entropic index, , very similar to that obtained for the APS data.

Finally, in the panels of Fig.8, the positive correlation between and for the APS data set and the agent-based model simulation is shown for the three interdisciplinarity levels. The agreement between real and simulated data is remarkable. The only feature, visible in the APS data, which the model does not reproduce, is the presence of authors who published just a few papers (even below 10) but with a very high number of citations. Such an occurrence, characterizing in particular the interdisciplinarity level 1, is probably unpredictable since, as it has been shown in a recent study Sinatra16

, scientists have the same chance of publishing their biggest hit at any moment in their career and even less productive authors have a chance of publishing very cited papers.

Summarizing, the simulations performed with our agent-based model correctly reproduces the main stylized facts observed in the analysis of the APS data set, confirming that the level of interdisciplinarity plays an important role in determining the scientific success of an author during her academic career. It is important to stress that the calibration of the model with the real data make the output of a single simulation run very robust, despite the differences due to the random initial setup of several model parameters (such as the initial collocation of the authors around the world, their talent distribution and the position/movement of the PACS event-points). This means that these results can be considered quite general and well representative of the model’s behavior (we checked that they do not change even performing ensemble averages over many runs).

Having established the agreement between experimental and modeled data, we turn to the analysis of variables which are impossible to observe directly from the real data. For example, one could wonder if the most successful authors in the three interdisciplinarity groups are also the most talented ones. In a recent numerical study about the causes behind the achievement of success in our life Pluchino18 , it has been shown that individual talent is necessary but not sufficient to become rich or to climb the social ladder: luck plays a fundamental role and very often moderately gifted, but very lucky, people surpass highly talented, yet unlucky, individuals.

Finally, we show that this counterintuitive feature holds also in the scientific context addressed here. We performed replica runs of our agent-based model, with the same calibration based on real APS data, but with different distributions of the talent among the authors and with different initial positions for both the agents and the event-points. In Fig.9 we plot the final number of papers (left column) and the final number of citations (right column) cumulated by each author belonging to the three interdisciplinarity groups during all the simulations, as a function of their talent. The results indicate that very talented people – for example researchers with a talent – are very rarely the most successful ones, regardless the interdisciplinarity group they belong. Rather, their papers or citations score stays often quite low. On the other hand, scientists with a talent just above the mean – for example in the range

– usually cumulate a considerable number of papers and citations. In other words, the most successful authors are almost always scientists with a medium-high level of talent, rather than the most talented ones. This happens because (i) talent needs lucky opportunities (chances, random meetings, serendipity) to exploits its potentialities, and (ii) very talented scientists are much less numerous than moderately talented ones (being the talent normally distributed in the population). Therefore, it is much easier to find a moderately gifted

and lucky researcher than a very talented and lucky one.

Figure 9: Papers and citations vs talent, collected over 10 replica runs of the same numerical simulation. Each circle in the figures represents the total number of papers (left column) or the total number of citations (right column) cumulated by each author of the three interdisciplinarity groups in each of the 10 runs, reported as function of the corresponding talent. These plots indicate in a clear way that the most successful individuals are never the most talented ones.
30 0.06 191 0.08
49 0.08 297 0.11
82 0.10 505 0.12
Table 2: Details about the percentage of moderately gifted () and highly talented () authors whose publications or citations overcome the respective averages, for each of the three groups with increasing interdisciplinarity level.

It is also interesting to note that this effect is more pronounced for authors with a low interdisciplinarity level and progressively decreases by increasing the degree of interdisciplinarity. In order to quantitatively address this last point, let us define as moderately gifted () authors with a talent around the mean, i.e. with , and highly talented () those with (i.e. greater than two standard deviations with respect to the mean). Let us also call and the average values of, respectively, the final number of papers and the final number of citations cumulated by each author inside the three groups , , . Looking to the details in Table 2, it is evident that, inside each of the three groups, these averages (columns 1 and 5) do increase with the level, highlighting a positive correlation between scientific success and interdisciplinarity analogous to that one already observed for the same quantities calculated for the APS data set and reported in Table 1 (columns 3 and 5). On the other hand, the percentages of highly talented scientists with a final number of papers or with a final number of citations , with respect to the same percentages for the moderately gifted one, also increase by increasing the interdisciplinarity level. This is seen by the ratios between the two percentages, and , which increase respectively from to and from to going from to .

Conclusions

In conclusion, in this paper we have shown, through both a statistical analysis performed on the APS data set and a comparison with the numerical results obtained by an agent-based model (calibrated on the real data), that the attitude to broaden the scope of their researches, mixing different fields, is able to provide more rewards to the scientists, since their productivity and their scientific impact increase with their level of interdisciplinarity. Moreover, averaging over several runs with different initial distributions of talent among all the authors, we have also shown that, very often, moderately gifted researchers reach higher level of scientific success than very talented ones, simply because they have had more opportunities or just because they were luckier. However, the interdisciplinarity level seems to slightly dampen this effect since its increase does enhance the probability of success of highly talented individuals with respect to the moderately talented ones. Due to the generality of the APS data set, we expect that our findings remain valid beyond the considered case study and beyond physics itself.

Acknowledgements

A.P. and A. R. acknowledge financial support by the project ”Linea di intervento 2” of the Department of Physics and Astronomy Ettore Majorana of the University of Catania

I Supplementary Information

i.1 S1. APS Data Set Analisys

We give here additional details about the methodologies behind the mining of the American Physical Society (APS) data set from which we have got the results described in the Main Paper (MP).

The APS data set consists of all the publications of American Physical Society from 1893 to 2013. Each publication is represented through a JSON file storing information about authors, their affiliations, the journal and the PACS or keywords associated to the paper. The database has of more than 550000 publications. A critical aspect relative to the APS data set relies on its noise due to the lexical heterogeneity. Lexical heterogeneity occurs when the tuples have identically structured fields across databases, but the data use different representations to refer to the same real-world object. In our case, authors and affiliations are stored using different conventions in each JSON file. Therefore, the same author, or affiliation can be represented in a different format (i.e. Mark John Smith or Mark J. Smith or Smith M.). Based on this consideration, two records can be considered equivalent if they are semantically equal. The similarity between records is computed by metrics which measure the semantic equivalence through a score. Record pairs with high similarity scores (above a specified threshold) are treated as duplicates.

In addition to the accuracy of classifying records pairs into matches and mismatches, the central issue consists of improving the speed of comparisons. Indeed, cleaning such data before its usage is a mandatory step to avoid redundant and noisy information and affect the reliability of further analysis. To remove duplicate entries we decided to compare two strings (i.e. affiliations of authors) using -grams qgrams in connection to Jaccard Similiarity tan2013data . The Jaccard Similarity of two sets and is defined as ranging from 0 to 1. Practically, we extracted from each string q-grams of length 2 (, for both authors and affiliations), then we claim two authors to be the same when their Jaccard Similarity is greater than a threshold set equal to 0.6. Similarly two affiliations have been declared to be the same if their similarity is greater than the threshold 0.66. These two threshold have been empirically established on a sample of data from APS data set by minimizing the ratio of false negatives (same author/affiliation but we consider the two authors/affiliations as different) and false positives (different authors/affiliations but considered the same author).

Due to the large number of authors and affiliation we experienced a computational bottleneck due to the quadratic time needed to perform all possible pairwise comparisons. To make such a cleaning step feasible we implemented the similarity computation in connection to the Locality Sensitive Hashing (LSH) cohen2001finding . LSH is an algorithmic methodology which makes use of hashing, that is able to fast identify similar pairs of objects without comparing them directly. Using such a technique we were able to reduce the computational effort from quadratic to linear. All the code have been developed in Php and the data, once cleaned, were stored into the relational database MySQL (v. 5.1). Further manipulation and analysis of cleaned data were done using R language.

The measure of the level of interdisciplinarity of the authors (in the discussion we will refer to them also as ’researchers’) is based on the APS’s PACS (’Physics and Astronomy Classification Scheme’). This scheme consists of a hierarchic partition of the publications in research areas of physics. Any PACS code has four hierarchic levels of increasing specificity: a first and a second digit composing a two-digit number, another two-digit number and a string of characters (e.g. 14.70.Bh). In particular, we work with the less specific hierarchic level, made up by the ten areas of research each corresponding to one of the ten different first digits (0, 1, …9; or equivalently 00, 10, …90) of the first two-digit number in the PACS code:

00 - GP : General Physics

10 - EPF : Physics of Elementary Particles and Fields

20 - NP : Nuclear Physics

30 - AMP : Atomic and Molecular Physics

40 - EOAHCF : Electromagnetism, Optics, Acoustics, Heat Transfer, Classical Mechanics, and Fluid Dynamics

50 - GPE : Physics of Gases, Plasmas, and Electric Discharges

60 - CM:SMT : Condensed Matter: Structural, Mechanical and Thermal Properties

70 - CM:EEMO : Condensed Matter: Electronic Structure, Electrical, Magnetic, and Optical Properties

80 - IPR : Interdisciplinary Physics and Related Areas of Science and Technology

90 - GAA : Geophysics, Astronomy, and Astrophysics

Since the APS database regards only the physics’ domain, this choice is led by our purpose of identifying an actual interdisciplinarity attitude in the researchers’ production. Any published paper can have one or more PACS codes assigned to it and according to our choice we assign different PACS codes to a paper only if these codes differ on the first digit; otherwise, we pile them up on a single code. In this way we assign to each paper a number of PACS codes that is equal to the number of the different broad - less specific - areas related to it. From what has been said, is understood that only PACS classified papers are considered.

Figure 10: (Left Panel) An example of calculation of the interdisciplinarity index for an imaginary author who published 6 papers. (Right Panel) Histogram of the interdisciplinarity index for the researchers interested by our study. The three different interdisciplinarity levels are represented with different colors: red (level 1), green (level 2) and blue (level 3). The bar for between and represents the number of researchers with . In particular, the first two bars contains only researchers with and , respectively.

i.1.1 S1.1 Researchers Classification

Having at our disposal the PACS coded areas of all the papers, we may use them to define an index that helps us to quantify the variety of disciplines (areas) interested by the scientific production of any researcher. This variety is two-fold: a researcher may explore many different areas one by one, i.e. producing on many different PACS codes through papers with assigned only one code at a time; or she may explore few different areas but jointly, i.e. producing papers having more codes assigned together. In other words, a researcher’s production can be interdisciplinary either because of the total number of areas that it interested, or because of the average number of areas jointly interested in one of its typical paper. As it is going to be evident, apart from an obvious constraint, these two degrees of interdisciplinarity are independent of each other. This observation led us to define an interdisciplinary index for the researcher as

where is the average number of different PACS codes jointly present in each paper of the considered author and is the total number of different PACS codes present in all the papers of the same author. One can also imagine to assign to an array containing all the PACS numbers present in her papers. The constraint mentioned above is the mere condition for any . In fact, the maximum number of PACS codes assignable to a paper is five, so, at least in principle, the maximum value of is 50, with and . In practice, for our data set, the maximum value found for is 23, with and . In the left panel of Fig.10 an example of calculation of the interdisciplinarity index for a hypothetic author is presented. This author has published papers, each one with different PACS numbers (1-6, 4, 1, 4-8, 6-8-1, 1, respectively). The corresponding PACS array is thus , and . Therefore, her interdisciplinarity index will be .

Once the interdisciplinarity index has been calculated for each researcher, we have distributed all the 7303 authors - resulted from the filtering procedure explained below - into three groups of different interdisciplinarity level (see right panel of Fig.10):

  • Level 1 ():     ( researchers of low interdisciplinarity level)

  • Level 2 ():     ( researchers of medium interdisciplinarity level)

  • Level 3 ():     ( researchers of high interdisciplinarity level)

The separation values between the levels have been chosen to have the three groups with comparable sizes and, for the set of researchers used here, the best values came out to be 3 and 6, if we want them as easy-to-remind integer numbers. To note that for the level 1, because of the condition , the index cannot take value in the open interval (1,2).

Figure 11: (Left Panel) The active researchers considered in the APS data set analysis, see text. (Right Panel) Time evolution, year by year, of the number of still active researchers. A linear decrease is found from 1987 to 2002, with 165 leaving researchers a year, on average. After 2002 a kind of cut off acts, maybe due to the their ages. The 28% of them is still active at the end of the thirty years.

The 7303 researchers on which we have conducted our analysis are the remaining ones of a filtering procedure conceived to study appropriately the researchers’ careers over a period of thirty years, from 01/01/1980 to 31/12/2009. The first requirement of the filtering is that a researcher must have produced her first paper in the period ranging from 01/01/1975 to 31/12/1985 (see the left panel of Fig.11). This ensures that all the researchers in the set started their careers in a quite short period, so avoiding that the possible premature end of the production activity of a researcher is due to her age. In this way, unless one started to produce in old age, that is a pretty remote possibility, all the researchers in the set have comparable ages. Moreover, the PACS classification was implemented from 1975 onwards, enabling us to refer only to papers published starting from that year. The second requirement is that a researcher must have produced a minimum number of (PACS classified) papers, that we chose to be 3. The third, last, requirement is related to the way in which the raw APS database at our disposal has been cleaned (extensively explained in the specific section).

Briefly, at each author’s name has been given an author identification code and the same code has been assigned to different names if they were similar enough. We refer to the authors’ name associated with the same author code as aliases of that author. We ruled out those author codes with more than one alias associated to it. We realized, indeed, that not enough rarely happened that two aliases referred to two actually different authors (with similar names, unfortunately), leading us to overestimate the productivity and the impact of the unique author code which they were assigned to. These three requirements filtered the database leaving us with 7303 initial author codes, corresponding to the 7303 actually different researchers on which we have performed our analysis.

Looking at the last published paper by each researcher, apart of a late cut off, an approximately linear decrease in time of the number of active researchers came out. Starting with all the 7303 researchers active in 1985, we end up with 2041 of them still active in 2009 (Fig.11, right panel).

i.1.2 S1.2 Scientific Impact Analysis

The scientific production in the period 1980-2009 of the 7303 selected researchers consists of 89949 (PACS classified) papers. These are distributed in a slightly different way over the three defined classes of interdisciplinarity, see the left panel of Fig.12. In all of them one can note long tails of a few dozen of researchers with an exceptional productivity, but in general interdisciplinarity seem to have a positive influence on the average productivity of a scientist. Some examples of the increase in the scientific production during single excellent careers for the three classes is shown in the right panel of Fig.12, where the cumulated number of papers is reported as function of time.

Figure 12: (Left Panel) Papers distribution for the three defined classes of interdisciplinarity, each represented with a different color: red (level 1), green (level 2) and blue (level 3). A tail of scarse statistics starts for numbers of researchers with more than about 150 published papers. (Right Panel) Examples of scientific production in some excellent careers for the three interdisciplinarity classes.
 authors  papers  PpA  avg. PpA (st. dev.)
level 1 2445 18832 7.70 15.38 (37.22)
level 2 2511 35892 14.29 29.35 (67.18)
level 3 2347 50947 21.71 27.30 (42.26)
Table 3: Statistical indicators of the 89949 published papers over the three defined classes of interdisciplinarity. A paper is counted in more than one class if it is coauthored by researchers belonging to different classes, so the sum of the reported numbers of papers exceeds 89949. A positive correlation between scientific production and interdisciplinarity level is found: the number of papers per researcher (PpA = papers/authors) increases quite strongly as the interdisciplinarity level grows.
PACS Area
00 10 20 30 40 50 60 70 80 90
Level
1
papers
609
(3.23%)
4892
(25.98%)
5989
(31.80%)
1488
(7.90%)
305
(1.62%)
720
(3.82%)
1518
(8.06%)
5232
(27.78%)
93
(0.49%)
236
(1.25%)
researchers
231
(9.45%)
782
(31.98%)
811
(33.17%)
277
(11.33%)
128
(5.24%)
197
(8.06%)
475
(19.43%)
744
(30.43%)
68
(2.78%)
98
(4.01%)
Level
2
papers
3244
(9.04%)
7466
(20.80%)
6703
(18.68%)
3006
(8.38%)
1715
(4.78%)
1064
(2.96%)
6101
(17.00%)
15013
(41.83%)
1361
(3.79%)
1121
(3.12%)
researchers
1032
(41.10%)
849
(33.81%)
794
(31.62%)
685
(27.28%)
528
(21.03%)
220
(8.76%)
1213
(48.31%)
1317
(52.45%)
696
(27.72%)
367
(14.62%)
Level
3
papers
12397
(24.33%)
6056
(11.89%)
4700
(9.23%)
6430
(12.62%)
7265
(14.26%)
1813
(3.56%)
14159
(27.79%)
21461
(42.12%)
5612
(11.02%)
1867
(3.66%)
researchers
1705
(72.65%)
743
(31.66%)
699
(29.78%)
1216
(51.81%)
1348
(57.44%)
503
(21.43%)
1790
(76.27%)
1790
(76.27%)
1447
(61.65%)
460
(19.60%)
Table 4: Distribution of the researchers of each interdisciplinarity level and their papers through the ten PACS coded areas.

A confirm of the positive correlation between scientific production and interdisciplinarity level is shown in Table III. Comparing the number of papers per author (PpA) and the (real) average number of papers per author (avg. PpA), we also find a stronger presence of coauthoring in the level 1 and level 2 classes than in the level 3 class. This is due mainly to the fact that a lower percentage of researchers of the level 3 class participated to large scientific collaboration, respect to the other two classes.

Figure 13: (Left Panel) Citations distribution for the three defined classes of interdisciplinarity, each represented with a different color: red (level 1), green (level 2) and blue (level 3). (Right Panel) The same careers shown in Fig.12 are here addressed in terms of time evolution of scientific impact.
 authors  papers  citations  CpA  avg. CpA (st. dev.)
level 1 2445 18832 230448 94.25 217.52 (598.48)
level 2 2511 35892 515635 205.35 458.44 (1121.24)
level 3 2347 50947 843292 359.31 479.07 (997.75)
Table 5: Statistical indicators of the citations received by the authors and their papers for each of the three defined classes of interdisciplinarity. All these citations divide slightly differently for each class (Fig.13, left panel). A positive correlation between scientific impact, in terms of citations received, and interdisciplinarity level is found: the number of citations per author (CpA = citations/authors) raises as the interdisciplinarity level increases.

By looking minutely at their production one finds out that all of them did research in the areas of particle and nuclear physics. More precisely, these researchers took part in large scientific experiments (e.g. BABAR, CLEO, CDF collaborations) during the 2000s. These large collaborations of hundreds of scientists ensure to the participants high rates of scientific productivity of even 60/70 published papers a year, an unachievable goal for the small research groups working in other areas. As proved by the composition of the three interdisciplinarity classes in terms of the ten PACS coded areas - see Table IV - most of the researchers in our set who are involved in these large collaborations belong to the level 2 class, justifying the heavier tail found for this class compared to those found for the other two classes (Fig.12).

One easily notes that these indicators clearly underestimate the real productivity of the researchers, but it must be kept in mind that they refer only to (PACS coded) publications on APS and that the actual number of researchers decreased over the thirty years, as shown in Fig.11.

The 89949 (PACS classified) published papers of the set received a total of 1329374 citations within the APS system in the period 1980-2009. From the point of view of the 7303 researchers, considered as independent, they received a total of 2807368 citations in the same period. All these citations divide similarly among the researchers of each of the three interdisciplinarity classes, as shown in the left panel of Fig.13. Also in this case, as previously shown for the papers production, we found a positive correlation between scientific impact, in terms of citations received, and interdisciplinarity level (Table V). Finally, in the right panel of Fig.13, the increase in the number of citations cumulated by the same excellent careers considered in Fig.12 is reported as function of time. Notice that not necessarily the best score in terms of published papers does imply the best score in terms of scientific impact and vice-versa.

Figure 14: An example of initial setup for our simulations.

As a final curiosity, apart from these excellences, let us see some other authors names belonging to the three interdisciplinarity groups of our data set. In particular, in the group one find mainly scientist who have been working in nuclear physics, like W. Alberico, U. Lynen, Y.T. Oganessian, W. Trautmann. On the other hand, in the group one can find scientists who worked in various fields, from chaos theory to gravitational waves, or from quantum information to cosmology, as for example C. Grebogi, D. Deutsch, K. Wilson, J.E. Jaffe, L. Smolin, P.C.W. Davies, G. Pizzella. Finally, in the most interdisciplinar group, one finds mainly statistical or condensed matter physicists, scientists involved in complex networks and dynamical systems, and also cosmologists or experts of string theory with broad views (P. Bak, A. Coniglio, K. Kaneko, M. Mezard, S. Havlin, D. Sornette, G. Parisi, J. Barrow and B. Greene).

i.2 S2. The agent-based model

Let us address, now, some details about the agent-based model with which we were able to successful replicate the stylized facts of the APS data set. The model was realized within NetLogo, a very powerful multi-agent programmable environment particularly suitable for the the simulation of the dynamical behavior of complex systems netlogo .

i.2.1 S2.1 Initial setup of the model

In Fig.14 we show the NetLogo ”world” as it appears at the beginning of a generic simulation. It is a squared metric space, with a size of patches, where the various agents live and move. Randomly distributed around the world are visible the two main categories of agents of our model: researchers, with a person-like shape, and PACS event-points, with a point-like shape. Both these agent’s types are active elements of the environment, able of interact one among each other.

Figure 15: (Left Panel) Individual parameters which characterize each single simulated author . (Right Panel) Normal distribution of talent among the agents, with mean (indicated by a dashed vertical line) and standard deviation (the values are indicated by two dotted vertical lines). This distribution does not change during a single simulation run.

In the figure we represent only individuals for a better visualization, but in all the simulations we consider all the active researchers, as in the APS data set. These researchers do no move during a simulation and are divided into the three groups , and according with the real values of their interdisciplinary index . Therefore, we will find individuals in the group (in red), in the group (in green) and in the group (in blue). During a single simulation run, we will let these researchers to publish papers and collect citations with a periodicity of year and for a total time interval of years, in analogy with the real time period addressed in the APS data set. A first evident approximation of the model is the fact that we will keep the total number of active authors constant during the years, while we know that their number do decrease, as shown in Fig.11. This will imply an overestimation of the total number of published papers of several authors, but - as we have already stated - we are interested to capture the main stylized facts of the APS data set not the single details (which, of course, would be impossible to reproduce).

Each simulated author is characterized not only by the variables , , and (), which are read from the APS data set, but also by other individual parameters shown in the left panel of Fig.S6 and described in the MP. In particular, to each researcher is assigned a fixed talent (intelligence, skill, …) randomly extracted at the beginning of each simulation run from a truncated Normal distribution with a mean and a standard deviation (see the right panel of Fig.15). All the other individual parameters start from a null value at and increase in time during the simulation following opportune dynamical rules.

As we will show in the next subsection, other global parameters need to be introduced in the model and calibrated through the comparison with the real APS data.

Figure 16: Left panel: A histogram showing the number of event points for each PACS class, over a total of , according to its relative percentage abundance in the APS dataset. Right panel: A zoom from Fig.14, where only three researchers, each belonging to one of the three interdisciplinarity levels, are reported with their colors: red (level 1), green (level 2) and blue (level 3). Around them, some moving events are visible, represented as points of different colors selected from a magenta scale. Each color corresponds to a given PACS class of the APS data set, numbered from 0 (darkest) to 9 (brightest), as also shown in Fig.S5. The relative percentage of event points of each class is different and corresponds to the real one. Around each of the three researchers, the corresponding sensitivity circle is also visible, whose radius decreases by increasing the interdisciplinarity level (see text).

i.2.2 S2.2 Calibration of the model

The first global parameters that need to be calibrated concern the PACS event-points present in the NetLogo world. These points are colored with different shades of magenta (see Fig.S5), one for each of the PACS classes, and randomly move around the world during a simulation run with a frequency much greater than the simulation time step, that in our model corresponds to 1 year (in particular, each point shifts of 2 patches towards a random direction 73 times during each time step - i.e. with a frequency equivalent to 5 days).

As explained in the MP, in our model the PACS event-points represent opportunities, ideas, encounters, intuitions, serendipity events, etc., which can periodically, and randomly, occur to a given researcher along her career. The relative abundance of points belonging to each PACS class is fixed in agreement with the information of the APS data set and it can be appreciated in the histogram shown in the left panel of Fig.16 (for example, it appears that the PACS code 70 is the most expressed, while the PACS code 90 is the less present). The total number of these points is one of the global parameters that have to be calibrated.

The dynamical rules of the model, presented in detail in the MP, assume that the researchers, during their careers, are exposed to events and ideas which could trigger research lines, with the consequent articles production, along one or more different disciplinary fields according with the PACS numbers associated to each of the event-points. A given researcher , depending on her interdisciplinary index , is sensitive only to the points corresponding to the numbers present in her PACS array ; let us define these points as ’special’ for that researcher. Every year , a check is performed over all the researchers in order to verify what and how many event-points would fall inside their ”sensitivity circles”, which represent the extension of their sensitivity to the special points and therefore influence the publication dynamics. In the right panel of Fig.16 is shown a zoom of the world, where three researchers, belonging to the three interdisciplinarity groups , and , are reported together with their ”sensitivity circles”. The sizes of these circles are other three parameters that have to be calibrated through the comparison with real data.

Figure 17: (Left panel) An example of the dynamical rules for the publication of papers, see text. (Right panel) The total number of event-points and the radius of the sensitivity circles for the three groups of authors can be chosen by looking at the agreement between the average values of the simulated and the corresponding obtained from the APS data set, see text.

In the left panel of Fig.17 we show an example of the publication dynamics for the generic author . Let us suppose that is the number of special PACS points randomly falling in the sensitivity circle of at time . In this example but since, among the four PACS numbers (1, 4, 6, 8) present in the array (real data), only three (1, 4, 8) do fall inside the circle. We can therefore define a temporary array containing these numbers. At this point, as explained in the MP, the considered researcher compares its talent with a random real number . Let us suppose that : in this case the number of her published papers increases of an integer quantity randomly extracted from a Normal distribution with mean and standard deviation . The factors and are other two global parameters (both ) that have to be determined by the comparison with real data (notice that these parameters are fixed in time and are common to all the authors, while and are different for each author and are also variable in time, since they do depend on her past production at time ). Finally, all the new publications will be characterized by the PACS numbers contained in the array . In the example of Fig.17 , thus three new papers will be added to the papers array obtaining the new updated array where each of the new papers is characterized by the same three PACS numbers (1, 4, 8) – in practice, for each paper a copy of the array is saved.

The rationale behind these rules is twofold. On one hand, each researcher exploits the opportunities offered by the event-points falling in her sensitivity circle with a probability proportional to her talent, i.e. more talented authors have a greater a-priori probability of publishing new papers. On the other hand, the periodic increment in the number of publications is a constant fraction of the already published papers, i.e. the greater is the number of existing publications at time , the higher is the number of new publications at time (a sort of Matthew effect). Of course several approximations with respect to the reality have been assumed here. In particular, we assign the same PACS numbers to all the new papers published by at time and we do not consider coauthoring in the papers publication (each paper has a single author). This latter approximation contributes to produce an excess of published papers at the end of a simulation, but this is not a problem since we are interested in reproducing only the stylized fact represented by the shape of the papers distribution.

Figure 18: Comparison between the papers distribution obtained from APS data set (open circles) and that obtained with the model simulation with , and (full circles). The two distributions show a power-law behavior with the same exponent .

In order to choose the correct values for the global parameters previously introduced, i.e. the total number of event-points, the radius of the sensitivity circles and the factors and , we have run several simulation tests with different combinations of these parameters and compared the numerical results with the real APS data.

First, we considered the averages , calculated over all the authors of the three groups (), of the average number of different PACS simultaneously present in their publications at the end of the simulation (i.e. at ) and compared them with the analogous real values (). It turned out that the values of strictly depend on both the total number of event-points and the radius of the sensitivity circles. The choice of and of a radius of , and patches for the groups , and respectively, was able to produce the best agreement with the APS data, with an error of , as shown in the right panel of Fig.17. The decreasing size of the radius of the sensitivity circles for increasing interdisciplinarity levels, can be also justified by the evidence that the probability for a given researcher to find special event-points inside her sensitivity circle increases with , and therefore with the interdisciplinarity index , thus if we adopted the same size of the circles for the three groups , and , we would introduce a bias in favor of authors with medium and in particular with high interdisciplinarity level.

Second, we were able to choose the correct values for the factors and by comparing the simulated distribution of all the published papers (without distinctions among the interdisciplinarity levels) with the real one extracted from the APS data set. It turned out that the choice and was able to produce a simulated papers distribution with a power-law behavior with the same slope (-2.3) of the real one (see Fig.18). Notice that, due to the constraints imposed by the calibration, these first results are very robust and do not depend on the details of the initial conditions of the simulations (i.e. do not depend neither on the particular realization of the distribution of talent among the agents, nor on the initial random position of both the agents and the event-points).

Figure 19: (Left Panel) An example of the dynamical rules regulating the increase of reputation of an author in the fields corresponding to the PACS present in each new publication, see text. (Right Panel) Comparison between the citations distribution obtained from APS data set (open circles) and that obtained with the model simulation with and (full circles). The two distributions show a power-law behavior with the same exponent .

Let us finally address the calibration of the citation dynamics for our model. As we have just seen, every year all the researchers have the chance to increase their number of publications. In correspondence of each new paper, author also increases her own reputation in each of the disciplinary fields indicated by the PACS numbers associated to that paper. As explained in the MP, each one of the 10 elements of the reputation array is a real number, included in the interval , representing the reputation level reached by the researcher in the corresponding disciplinary field at time (see the top-left panel of Fig.S10 for an example).

A plausible approximation to account the behavior of the reputation level of a generic author in a given field at time can be that of considering it as a semi-linear function of the number of papers published in that field at time . In other words, we assume that does vary with following the function

where and are global parameters that, again, have to be calibrated with the real data, while is the abscissa of the inflection point (that depends on ).

Since, following the publication/citation dynamical rules explained in the MP, the total number of citations reached by the author at time does depend on both her citation score and her reputation array at time (Matthew effect), the choice of and does influence the citations distribution obtained at the end of a simulation (i.e. at ). Through several simulation tests, where different combinations of these parameters were adopted, we found that the values and (see bottom-left panel of Fig.S10) were able to produce a simulated overall citations distribution (without distinctions among the interdisciplinarity levels) that overlaps the analogous distribution obtained from the APS data set, following a power-law behavior with the same slope (, see the right panel of Fig.19). Again, the constraints imposed by the comparison with the real data make these simulation results very robust, substantially independent from the initial conditions.

In conclusion, as last point to address, we also notice that - as observed in the MP - the calibrated agreement between the simulated averages and the analogous real ones for the three interdisciplinarity groups () do not ensure, of course, the correspondence of the individual () of each agent-author at the end of a simulation with her initially assigned . Being the fixed for all the authors during the simulation, this also implies that their initial value of the (real) individual interdisciplinarity index can be different with respect to the corresponding one obtained at the end of the simulation. As a consequence, after a given simulation run, all the authors have to be reassigned – on the basis of the same rules described in paragraph 1.1 – to the three interdisciplinarity groups before calculating the corresponding papers and citations distributions (as those showed in the MP). We call these new groups , and . It results that the number of authors belonging to , and is not exactly the same of the number of authors belonging to the original groups , and , but typically the differences between the old and the new groups do not exceed . In the simulation results presented in the MP, the sizes of the three new groups were, respectively, , and . With respect to the original sizes shown in Table III, we notice that slightly increased the number of its members, group slightly decreased it, while group leaved it relatively unchanged.

References

  • (1) R. Van Noorden, Nature 525 (2015)306
  • (2) R. Rylance, Nature 525 (2015) 313.
  • (3) G.E.A. Solomon, S. Carley , A.L. Porter, PLoS ONE 11(4) (2016) e0152637.
  • (4) National Academy of Sciences, Engineering and Medicine (2009) Keck Futures Initiative ( NAKFI) Grants. Available: http://www.keckfutures.org/grants/index.html.
  • (5) R. K. Pan, S. Sinha, K. Kaski, J. Saramaki, Nature Scientific Reports,2 (2012) 551.
  • (6) R. Sinatra, P. Deville, M. Szell, D. Wang and A-L. Barabasi, Nature Physics 11 (2015) 791.
  • (7) M. Bonaventura, V. Latora, V. Nicosia, P. Panzarasa, arXiv:1712.07910
  • (8) R.N. Mantegna, H.E. Stanley, Introduction to Econophysics, correlations and complexity in finance, Cambridge University Press, (2000).
  • (9) D. Helbing, Quantitative Sociodynamics, Springer 2010.
  • (10) C.Castellano, S. Fortunato , V. Loreto, Reviews of Modern Physics 81 (2009) 591
  • (11) F. Schweitzer, Physics Today 71 (2018) 40
  • (12) Editorial The subtle success of a complex mindset, Nature Physics 14 (2018) 1149.
  • (13) M. Cristelli, A. Gabrielli, A. Tacchella, G. Caldarelli, L. Pietronero, PLoS ONE 8(8): e70726 doi:10.1371/journal.pone.0070726 (2013)
  • (14) G. Cimini, A. Gabrielli, F. Sylos Labini, PLoS ONE 9(12): e113470 (2014)
  • (15) F. Tria, V. Loreto, V.D.P. Servedio, S.H. Strogatz, Nature Scientific Reports 4 (2014) 5890.
  • (16) D. Wang, C. Song, A-L. Barabasi, Science, 342 (2013) 127.
  • (17) R. Sinatra, D. Wang, P. Deville, C. Song, A.L. Barabasi, Science 354 (2016) 1359.
  • (18) S. Fortunato et al. Science, 359 (2018) eaao0185.
  • (19) A. Pluchino, A.E. Biondo, A. Rapisarda, Advances in Complex Systems 21 (2018) 1850014 and refs therein.
  • (20) R.H. Frank, Success and Luck - Good Fortune and the Myth of Meritocracy. 2016, Princeton University Press.
  • (21) K. B rner, W. B. Rouse, P. Trunfio, H. E. Stanley, Proceedings of the National Academy of Sciences Dec 2018, 115 (50) 12573-12581; DOI: 10.1073/pnas.1818750115
  • (22) C. Tsallis, Introduction to nonextensive statistical mechanics, Approaching a complex world, Springer (2009).
  • (23) L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, in Proceedings of the 27th International Conference on Very Large Data Bases (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001), VLDB ’01, pp. 491-500, ISBN 1-55860-804-4, URL http://dl.acm.org/citation.cfm?id=645927.672200.
  • (24) P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining (2013).
  • (25) E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang, IEEE Transactions on Knowledge and Data Engineering 13, 64 (2001).
  • (26) Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.