Log In Sign Up

Feature analysis of multidisciplinary scientific collaboration behaviors: A case study on PNAS

by   Zheng Xie, et al.
National University of Defense Technology
Shanghai Jiao Tong University

The features of collaboration behaviors are often considered to be different from discipline to discipline. Meanwhile, collaborating among disciplines is an obvious feature emerged in modern scientific research, which incubates several interdisciplines, such as sustainability science. The features of collaborations in and among the disciplines of biological, physical and social sciences are analyzed based on 52,803 papers published in a multidisciplinary journal PNAS during 1999 to 2013. In the aspect of similarities, the data emerge the similar transitivity and assortativity of collaboration behaviors, the identical distribution type of collaborators per author and that of papers per author. In the aspect of interactions, the data show a considerable proportion of authors engaging in interdisciplinary research, and the more collaborators and papers an author has, the more likely the author pursues interdisciplinary research. The analysis of the paper contents illustrates that the development of each science category has an equilibrium relationship in the long-run with the developments of typical research paradigms and transdisciplinary disciplines. Hence, those unified methodologies can be viewed as grounds for the interactions.


page 1

page 2

page 3

page 4


Feature analysis of multidisciplinary scientific collaboration patterns based on PNAS

The features of collaboration patterns are often considered to be differ...

What increases (social) media attention: Research impact, author prominence or title attractiveness?

Do only major scientific breakthroughs hit the news and social media, or...

Research Project: Text Engineering Tool for Ontological Scientometry

The number of scientific papers grows exponentially in many disciplines....

The Chaperone Effect in Scientific Publishing

Experience plays a critical role in crafting high impact scientific work...

Co-contributorship Network and Division of Labor in Individual Scientific Collaborations

Collaborations are pervasive in current science. Collaborations have bee...

Hack Weeks as a model for Data Science Education and Collaboration

Across almost all scientific disciplines, the instruments that record ou...

1 Introduction

Natural and social sciences provide methodical approaches to study, predict and explain natural phenomena and sociality (human behaviors and psychological states) respectivelyWeingart . The specialization of knowledge in these sciences forms various disciplines. Meanwhile, to solve problems whose solutions are beyond the scope of a single discipline, researchers need to integrate data, techniques, concepts, and theories from several disciplinesCooper , Hurd , National , Hirsch . Interactions between disciplines incubate several interdisciplines, fuzz the boundary of natural and social sciences, and produce many important scientific breakthroughsLiu , Siedlok , Gooch .

Studying collaboration patterns within and across disciplines or sciences contributes to understand the diversity of cooperative behaviors and fusion modes of knowledge. Papers of multidisciplinary journals provide an informative and reliable platform for this studying, because the media of natural and social sciences mainly count on papersLariviere2 , Moody , Glanzel , Hicks . Here we investigated the patterns based on 52,803 papers published in Proceedings of the National Academy of Sciences (PNAS) over the years 1999-2013. The content of dataset spans three science categories: social sciences and two principal sub-sciences in natural sciences, viz. biological and physical sciences.

Collaboration relationship can be expressed by graphs, termed as coauthorship networks. Hence the patterns can be studied in network perspective. Coauthorship networks from different scientific fields appear specific similarities, such as partial transitivity of coauthorship, homophily on the number of collaborators, the right-skewed distribution of collaborators per author

Newman1 , Newman4 , Barab , Newman0 , Sarigol , Xie3 , Tomasello

. These commonalities also appear in the collaboration networks of three author sets (which come from the three science categories of PNAS respectively). We dived more into the rule and reason of these commonalities. We found that the distribution of collaborators per author and that of papers per author follow the same distribution type: a mixture of a generalized Poisson distribution and a power-law. We provided a possible explanation for the distribution type and these commonalities through the diversity of author abilities to attract collaborations.

A range of previous works discussed quantitative indexes of interdisciplinarity for sciencesBraun , Porter1 , Levitt2 , for disciplinesPorter , Chen , Rafols , Abramo , for universitiesBordons , for journalsLeydesdorff , Zhang , and for research teamsLungeanu . Some works addressed the correlation between interdisciplinarity and scientific impactLariviere , Lariviere1 , Rinia , Wang (e. g. citation catching abilityLevitt1 , Levitt , Chen1 ). Based on specific general ideas of these references, we studied interdisciplinary activities of PNAS through paper co-occurrence of disciplines, and through some indexes calculated based on the co-occurrence, such as Rao-Sterling diversityStirling , and betweenness centralityLeydesdorff2007 .

We further studied the collaboration patterns across disciplines, and found that a considerable proportion of authors and papers in physical and social sciences involved in interdisciplinary research. The multidisciplinary coauthorship network extracted from the data has a giant component, which contains more than 88%, 80% and 71% authors in biological, physical and social sciences respectively. A considerable number of authors contribute to the formation of giant component. The contributions of author activity and productivity to the formation increase over time. The high extent of interdisciplinarity shown by the case study might not be representative of general collaboration patterns, because authors could submit more interdisciplinary work to multidisciplinary journals than domain specific ones.

This report is structured as follows: the data processing is described in Section 2; the similarities and interactions are analyzed in Section 3; and the discussion and conclusion are drawn in Section 4.

2 The Data

2.1 Reason for using the data

The case study involves two concepts, namely multidisciplinarity (researchers from different disciplines study within their disciplines) and interdisciplinarity (study beyond disciplinary boundaries)Besselaar . Multidisciplinarity could be viewed as a combination of disciplines, and interdisciplinarity as a merging of them. A multidisciplinary journal with the scope covering natural and social sciences can be utilized to analyze the interactions between science categories. Such journal can be also utilized to compare the collaboration patterns of multi-disciplines and find similarities. PNAS publishes high quality research papers, and provides reliable discipline information of those papers. The journal also provides a high quality data platform for analyzing worldwide collaboration patterns, because nearly half of its papers come from authors outside the United States.

Multidiscipline journals: Science, Nature and Nature Communications do not provide discipline information of papers. Journal of the Royal Society Interface focuses on the cross-disciplinary research at the interface between the physical and life sciences, but does not involve social sciences. Our analysis is restricted to PNAS, which brings limitations to our findings. For example, the media of social sciences not only count on papers, but also on booksGlanzel , Hicks . Hence the results obtained must be carefully interpreted as being the patterns of researchers who publish papers in the chosen journal. However, due to the influence and representability of PNAS, the case study could contribute to understanding aspects of multidisciplinary and interdisciplinary collaboration patterns.

2.2 Discipline information

Most papers of the dataset have been classified into three first-class disciplines (biological, physical, and social sciences) and

second-class disciplines (Table 1). Interdisciplinary papers are classified into several disciplines. The data contain 43,304 biological papers (including 3,957 papers of biophysics), which account for 82.01% of the total. The data also contain 5,987 physical papers and 1,310 social papers. There are 2,961 interdisciplinary papers belonging to more than one of the second-class disciplines, which account for of the total. The significant difference of discipline proportion does not mean the preference for PNAS. In reality, the number of researchers involved in natural sciences (especially, biological sciences) is far more than that of researchers involved in social sciencesKagan . There are 1,842 papers that are only classified into the first-class disciplines. For these papers, their second-class discipline are regarded to be missing, but which have been regarded to be the same as their first-class disciplines in our pervious workXie4 . Hence the data in Table 1 are different from those in Reference Xie4 .

Based on the discipline information of papers, we constructed a network to express the relationship between the first-class and the second-class disciplines (Fig. 1), where two disciplines are connected if they are the first-class and the second-class disciplines of a paper. We can also construct a network to express the interactions between the second-class disciplines (Fig. 2), where each node is a discipline and two nodes are connected if there is a paper belonging to them simultaneously. These networks could evolve with the discipline information of newly published papers. So using the latest data, one may have a more comprehensive view.

Figure 1: The relationship between the first-class and the second-class disciplines. The network is built based on the discipline information of papers in PNAS 1999-2013. Two disciplines are connected if they are the first-class and the second-class disciplines of a paper. The node size indicates node degree.
Figure 2: Interactions between the second-class disciplines. The weighted network is built based on the discipline information of interdisciplinary papers in PNAS 1999-2013. Edge width indicates edge weight: the number of interdisciplinary papers between two connected disciplines.
Agricultural science 22 226 9 20 3.19
Anthropology 114 556 24 110 40.02
Applied biological science 135 767 9 134 1.79
Applied mathematics 191 380 27 182 49.39
Applied physical science 309 816 26 299 29.14
Astronomy 3 50 3 3 0.13
Biochemistry 333 6,303 19 327 16.96
Biophysics 359 3,957 16 359 7.91
Biophysics and computational biology 468 1,532 11 467 7.95
Cell biology 135 3,717 18 130 12.71
Chemistry 1,003 8,645 26 1,003 49.73
Computer science 77 101 17 70 9.50
Developmental biology 33 1,525 12 30 1.66
Earth, atmospheric, and planetary sciences 78 243 9 77 1.58
Ecology 162 1,084 15 162 10.00
Economic science 94 171 21 94 20.88
Engineering 217 392 19 217 13.85
Environmental science 184 695 20 183 25.44
Evolution 233 2,274 22 216 25.81
Genetics 103 2,664 20 97 12.68
Geology 137 285 10 136 2.79
Geophysics 23 175 7 23 1.51
Immunology 43 3,070 10 38 1.45
Mathematics 18 561 11 17 3.36
Medical science 181 4784 20 170 14.01
Microbiology 92 2,812 17 89 11.85
Neurobiology 16 1,003 9 16 0.87
Neuroscience 290 4,398 16 280 12.00
Pharmacology 26 594 4 26 0.08
Physics 229 4,818 22 227 18.24
Physiology 33 1,116 12 32 5.82
Plant biology 27 1,700 12 27 4.62
Political science 7 17 5 7 0.54
Population biology 27 166 11 26 4.04
Psychological and cognitive science 160 487 16 159 5.09
Psychology 83 449 12 83 3.62
Statistics 90 146 20 85 19.34
Sustainability science 123 399 11 120 7.66
Systems biology 36 159 11 36 1.80

The number of papers and that of interdisciplinary papers of a discipline are counted based on the discipline information provided by PNAS. The degree , weighted degree , and betweenness centrality of a discipline are calculated based on the weighted network in Fig. 2.

Table 1: Specific indexes of the second-class disciplines in PNAS 1999–2013.

2.3 Coauthorship

Identifying ground-truth authors, termed as disambiguating author names, is an important, time-consuming, but a necessary procedure of coauthorship analysis. Several methods use the information of the provided names on papers (e. g. initial based methodsMilojevic2 ). The dominant misidentification of initial based methods is caused by merging two or more different authors as one. Hence, it deflates the number of unique authors, and inflates the size of the ground-truth giant component. Requiring additional information (e. g. email address) helps to reduce merging errors, but brings the difficulty of collecting information.

In PNAS 1999-2013, authors provide full first name. So the provided names on papers are directly used to identify authors. However, utilizing surname and the initial of the first given name will generate a lot of merging errors of name disambiguationKim1 . The proportion of these authors in the data is , and the proportion of these authors further conditioned on publishing more than one paper is . Meanwhile, even utilizing full names still produces merging errors, if some authors provide exactly the same name. Chinese names were found to account for name repetitionKim1 . We calculated the proportion of the names with a given name less than six characters and a surname among major 100 Chinese surnames. The proportion of these authors in the data is , and that of these authors further conditioned on publishing more than one paper is . The small values of these four proportions show that the impact of name repetition is limited. These proportions for specific subsets of the data are listed in Table 2.

PNAS 1999-2013 2.9% 0.3% 2.7% 1.1%
Biological sciences 2.7% 0.2% 2.7% 1.1%
Physical sciences 4.8% 0.4% 4.4% 0.9%
Social sciences 2.3% 0.1% 2.2% 0.3%
Biophysics 4.1% 0.3% 4.0% 1.0%
Interdiscipline 2.6% 0.1% 3.6% 0.6%

Indexes and are the proportion of the authors only providing the initial of their first given name and their surname, and that of these authors further conditioned on publishing more than one paper respectively. Indexes and are the proportion of the authors with a surname among the major 100 Chinese surnames and a given name less than six characters, and that of these authors further conditioned on publishing more than one paper respectively.

Table 2: Specific statistical indexes of the analyzed networks.

The method adopted here will split one author as two or more, if the author does not provide his name consistently. Splitting underestimates the giant component size, and the indexes used as evidences for universality of interdisciplinary research. Hence the results in Subsection 3.5, 3.6 could be regarded as conservative ones. In addition, the inaccuracy caused by the adopted method does not change the ground truth distribution type of collaborators per author and that of papers per authorKim1 .

3 Data analysis

3.1 Network properties

Coauthorship is a -ary relation, , hence it can be expressed by a hypergraph, a generalization of a graph in which an edge (termed as hyperedge) can join any number of nodes. Represent authors as nodes, and the author group of each paper (paper team) as a hyperedge. Then we can extract a coauthorship network from a hypergraph as a simple graph, where edges are formed between every two nodes in each hyperedge, and the multiple edges are treated as one. The terms “degree” and “hyperdegree” for nodes are used to express the number of collaborators and that of papers for authors respectively.

The data show that the average paper team size of biological sciences (6.624) and that of physical sciences (5.254) are larger than that of social sciences (4.634). The size relation fits the reality that the sizes of research teams are usually larger in natural sciences, and smaller in social sciencesKagan . Now let us consider the coauthorship networks of the considered papers in specific disciplines or science categories. All of these networks are highly clustered, assortative, and their average shortest path length scale as the logarithms of their number of nodes (NNAP in Table 3). These properties do not mean all of the networks are small-world. The network of social sciences is an exception, which even has no component containing more than 10% authors. However, it does not mean that the research in social sciences is carried out in isolation. In fact, 71.5% authors in social sciences belong to the giant component of the coauthorship network generated by the whole data. Therefore, analyzing the collaborations of authors restricting in single discipline has limitations. So we proceeded the analysis in the environment of all disciplines.

PNAS 1999-2013 202,664 1,225,176 0.881 0.230 6.422 0.868
Biological sciences 184,872 1,150,362 0.881 0.232 6.364 0.880
Physical sciences 24,766 101,166 0.933 0.452 10.89 0.455
Social sciences 5,121 18,786 0.946 0.683 6.574 0.087
Biophysics 13,480 48,012 0.905 0.177 7.665 0.636
Interdiscipline 13,680 53,588 0.951 0.558 9.397 0.093

The indexes are the number of nodes (NN), the number edges (NE), global clustering coefficient (GCC), degree assortativity coefficient (AC), average shortest path length (AP), the node proportion of the giant component (PG). The AP of the first two networks are approximately calculated by sampling 400,000 pairs of nodes.

Table 3: Specific statistical indexes of the analyzed networks.

3.2 Degree and hyperdegree

Aggregate degree and hyperdegree on the data (not restricted in single science category), and observe the degree distributions and hyperdegree distributions of three author sets (which come from the three science categories respectively). We found that although collaboration level differs from one science category to another, all of the distributions emerge a hook head, a fat tail, and a cross-over between them, which could be viewed as a common feature of coauthorship networks (Fig. 3

). The head and tail can be fitted by log-normal distribution and power-law distribution respectively

Milojevic1 .

Figure 3: Distributions of collaborators/papers per author. The panels show the distributions counted in PNAS 1999–2013 (red plots), and those counted in the papers of each science category (blue diamonds). Fitting distributions (purple curves) are mixtures of a generalized Poisson distribution and a power-law distribution. Fitting parameters are listed in Table 4. The regions “G-P”, “C-O”, “P-L” stand for generalized Poisson, cross-over and power-law respectively.

These distributions can also be fitted, as a whole, by a mixture of a generalized Poisson distribution and a power-law distribution. The fitting parameters are listed in Table 4

. We performed a two-sample Kolmogorov-Smirnov (KS) test to compare the distributions of two data vectors: node indexes (i. e. degrees, hyperdegrees), the samples drawn from the corresponding fitting distribution. The null hypothesis is that the two data vectors are from the same distribution. The

-value of each fitting shows the test cannot reject the null hypothesis at 5% significance level. Note that goodness-of-fit test is not suitable here, due to the small number of large degree authors.

Regarding authors as samples, a mixture distribution means those samples come from different populations, namely the collaboration patterns of the authors with few collaborators and papers differ from those with many collaborators and papers. In ReferenceXie6 , a possible explanation (which is free of disciplines) is given for the emerged mixture type of empirical degree distributions. With the same general ideas, a similar explanation can be adopted for hyperdegree distributions as follows.

The event whether a researcher collaborates with one another to publish a paper can be regarded as a “yes/no” decision. So the hyperdegree of a researcher is equal to the number of successes in a sequence of decisions made by the candidates who want to coauthor with that researcher. Suppose the number of those candidates to be

. Suppose the collaboration probability of each candidate to be

. Then, the hyperdegrees will follow a binomial distribution

, and so a Poisson distribution with expected value approximatively (Poisson limit theorem). The value of varies from author to author, due to the diversity of authors’ ability to attract collaborators.

Decisions of authors could be dependent. For example, collaborating with the researchers who have publishing experience helps to publish a paper. Hence we could regard hyperdegree as a random variable following a generalized Poisson distribution (which allows the occurrence probability of an event to involve memory

Consul ). In empirical data, most hyperdegrees are around their mode. Hence we could think of that they follow some generalized Poisson distributions with an expected value around their mode, and so form the generalized Poisson part of a hyperdegree distribution. A few authors experience a cumulative process of papers, which makes a hyperdegree distribution skew to the right and form a fat tail.

Degree distribution -value
Biological sciences 4.843 0.464 74.27 2.889 1.049 15 26 20 50 0.203
Physical sciences 3.958 0.477 49.31 2.798 1.037 7 14 20 53 0.178
Social sciences 3.292 0.513 20.78 2.657 1.046 11 28 20 35 0.111
Hyperdegree distribution -value
Biological sciences 0.028 0.269 1.968 3.099 35.57 2 6 10 13 0.979
Physical sciences 0.021 0.320 2.977 2.916 47.15 2 11 10 10 0.625
Social sciences 0.022 0.375 19.48 3.665 46.24 3 20 10 11 0.206

The ranges of generalized Poisson , cross-over, and power-law are , , and respectively. The fitting function is , where . The fitting processes are: obverse proper and ; calculate parameters of (i.e. , , ) and (i.e. , ) through regressing the empirical distribution in and respectively; find and through exhaustion to make pass KS test (-value). The sum of each over is near unity, which means that

can be regarded as a probability density function.

Table 4: The parameters of fitting functions.

3.3 Transitivity of coauthorship

Transitivity in society is that “the friend of my friend is also my friend”, which is a typical feature of social affiliation networks. In academic society, collaborators of an author likely acquaint and so coauthor with each other. For example, organizational and institutional contexts drive the formation of transitive coauthorship, and so contribute to clustering structures emerging in coauthorship networks.

The transitivity of a network can be quantified by two indexes in graph theory, namely global clustering coefficient (the fraction of connected triples of nodes which also form “triangles”) and local clustering coefficient (the probability of a node’s two neighbors connecting). High transitivity is a common feature of coauthorship networksNewman1 .

To what extent the transitivity is due to the activity of authors in academic society? The activity can be partly reflected through the number of collaborators, namely degree. Hence, the extent can be sketched through the correlation coefficients between degree and local clustering coefficient. Note that the correlation coefficients indicate the extent of a linear relationship between two variables or their ranks. The coefficients of variables and generally do not completely characterize correlation, unless the conditional expected value of given , denoted by , is linear or approximate linear function in . The conditional expected value of local clustering coefficient given degree is the average local clustering coefficient of -degree nodes, denoted by CC. The approximatively linear trend of CC shown in Fig. 4 guarantees the effectiveness of correlation analysis in Table 5. The decreasing trend cannot be deduced out from degree information. The denominator of the local clustering coefficient of a node grows quadratically with its degree, but the numerator cannot be calculated from degree information.

Discipline Indicator Mean Std SCC PCC
LCC 0.860 -0.398 -0.401
Biological sciences LTC 0.001 0.005 0.275 0.077
DN 21.09 0.543 0.400
HN 3.015 15.47 0.070 -0.046
LCC 0.806 -0.336 -0.382
Physical sciences LTC 0.001 0.005 0.306 0.074
DN 15.48 0.625 0.346
HN 2.682 12.44 0.169 0.015
LCC 0.784 -0.177 -0.263
Social sciences LTC 0.001 0.006 0.292 0.050
DN 12.87 0.723 0.482
HN 2.268 10.89 0.175 0.030

The indexes are local clustering coefficient (LCC), the local transitivity of collaboration (LTC), the average degree of node neighbors (DN), the average hyperdegree of node neighbors (HN). We calculated the mean of these indexes over authors, the Spearman rank correlation coefficient (SCC) and Pearson product-moment correlation coefficient (PCC) between each index and degree. For the two indexes with small PCC, we calculated their standard deviation (Std).

Table 5: Correlation coefficients between degree and transitivity/clustering indexes.

Does the decreasing trend of CC mean activity depresses transitivity? A positive answer to it is against common sense. In PNAS 1999-2013, 74.62% authors only publish one paper in the data, and the paper team sizes of 99.9% papers follow a generalized Poisson part, namely are around the average paper team size 6.028. The boundary of generalized Poisson part is detected by the boundary point detection algorithm for probability density functions in ReferenceXie6 (listed in Appendix). Hence the local clustering coefficients of most small degree authors are close to 1 (Fig. 4). A few authors experience a long period of collaborations, whose degree is obtained by accumulated over papers. For these authors, their collaborators in different papers could not collaborate, which decreases their local clustering coefficient. Hence the puzzling thing does not contradict with common sense, but is due to insufficiency of measuring transitivity such a dynamical property by counting “triangles” on a static network.

To design a more reasonable index measuring transitivity, let us come back to the original meaning of transitivity on coauthorship: the probability of two collaborators (who do not coauthor yet) of a researcher coauthoring in future. The probability can be calculated for dynamic hypergraphs of collaborations through time information. Averaging the probability over authors measures the global transitivity, the value of which is quite low in each science category (Table 5). Note that the calculation is limited in PNAS 1999-2013, and transitivity may happen in other journals or in other time period. So the values of transitivity here may be underestimated. The increasing trend of the transitivity probability of -degree authors (TC in Fig. 4) means the activity contributes to transitivity. It fits common sense: a researcher with many collaborators is likely to introduce his collaborators to cooperate.

3.4 Homophily of coauthorship

Coauthorship is based on specific features of researchers in common, including interest and geography. The homophily phenomenon appears in many social relations, and is called assortative mixing in network science Newman4 . Do authors of each science category prefer to coauthor with others that are similar in social activity or productivity? The social activity and productivity of authors can be quantified by two indexes, namely degree and hyperdegree respectively. Then the preference of an index could be sketched through the correlation coefficient between two variables, namely the index of a author and the average index of the author’s neighbors. Positive correlation means assortative, negative disassortative, and zero no preference.

Degree assortativity is a feature of coauthorship networks Newman4 . Does it mean sociable researchers (with many collaborators) will preferentially coauthor with other sociable researchers, and unsociable to unsociable? In a previous study Xie7 , we showed that the proportion of top 5.99% most sociable authors (measured according to degree) having coauthored with another such author is 99.5%. The proportion may even be underestimated, because these authors probably coauthored before 1999 or in other situations. Note that the splitting and merging errors of the used name disambiguation method affect the proportion at certain levels. Even so, the proportion is still remarkable.

However, if sociable researchers only coauthor with sociable ones, then there will exist many sociable researchers, which is against empirical degree distributions. Now let us analyze the influence of the social activity of authors on degree assortativity. For the authors with -degree, denote the average degree of their neighbors by DN. There exists a trend change in DN of each empirical dataset: the head part has a clear increasing trend, but the tail part does not (Fig. 4). It means that degree assortativity are mainly contributed by small degree authors.

Figure 4: Conditional expected values of specific indexes given degree. From to , we average over -degree nodes for local clustering coefficient (CC), the local transitivity of collaborations (TC), the average degree of node neighbors (DN), and the average hyperdegree of node neighbors (HN). The data are binned on abscissa axes to extract the trends hiding in noise.

The tipping point of the trend of DN is detected by the boundary point detection algorithm for general functions in ReferenceXie6 (listed in Appendix). Inputs of the algorithm are DN, and (, , ). Using those inputs is based on the observation of DN. Degrees of most authors are around their mode , and only a few authors have a large degree. Hence the neighbors of an author are likely to be small degree authors. Therefore, for small degree authors, the degree differences between those authors and their neighbors are small, and large for large degree authors, which leads to the trend change of DN.

The correlation coefficient between hyperdegree and the average hyperdegree of neighbors is around zero in each science category (Table 5). For the authors with -hyperdegree, denote the average hyperdegree of their neighbors by HN. It means choosing collaborators is free of the factor of productivity. In reality, members of a research team may have various scientific ages (newcomers, incumbents), so different hyperdegrees. Since collaborations mainly happen in a research team, collaborators of an author could have various hyperdegrees, which appears as the stable trend of HN.

Based on the average value of HN larger than , and 74.62% authors only having one paper in the data, we can derive that a large fraction of authors collaborate with at least one author who has published a paper in PNAS 1999–2013 to publish their first paper in the data. The proportions of these authors are 79.22%, 71.17% and 65.12% in biological, physical and social sciences respectively. The proportions may be overestimated, because some of these authors may publish papers in PNAS before 1999.

3.5 Interdisciplinarity at discipline level

The co-category proportion measures the activities of interdisciplinary research. There are 49.2%, 46.0% and 7.3% authors of social, physical and biological sciences who published interdisciplinary papers. The common sense suggests that social scientists engage in research solitary. The proportion of social sciences shows that the common sense does not hold in PNAS. RefereneLevitt3 also shows, there has been a move towards increased interdisciplinarity in recent decades in social sciences.

Above analysis process could be implemented to the second-class disciplines to obtain a high-resolution result. However some disciplines only have a few papers, e. g. only 17 papers of political science. So the analysis for those disciplines loses statistical meaning. Hence we took another perspective to analyze the interactions among the second-class disciplines by visualizing them as the network in Fig. 2. The network is connected, i. e. no discipline is isolated. Top three nodes of this network in terms of degrees and those in terms of betweenness centralities are Applied mathematics, Chemistry and Anthropology (Table 1). It means the theories, methods and problems of those disciplines are directly or indirectly used or studied by many disciplines. For each first-class discipline, we contracted its second-class disciplines as one node, and calculated the betweenness centrality of the contracted node. Their betweenness centrality (Biological sciences 47.51, Physical sciences 163.81, Social sciences 161.72) support the above analysis.

The co-category proportion only describes interdisciplinary activities. Now let us measure the discipline diversity of interdisciplinary research in each science category through Rao-Sterling indexStirling , where and are proportional representations of the papers/authors in science category and and is the level of difference attributed to categories and . Discipline information is used to classify authors into science categories: if one of his papers belongs to a discipline, an author can be classified into the discipline, so into the corresponding sciences. Note that an author can be classified into several science categories, if his papers belong to more than one discipline. Here we let for all and , hence the calculated Rao-Sterling index measures the balance-weighted variety of interdisciplinary research in the level of science categories. The index in author view and that in paper view show that the discipline diversity of interdisciplinary research in social sciences and that in physical sciences are much higher than that in biological sciences (Fig. 5).

Figure 5: The interdisciplinary research of PNAS 1999–2013 between and within biological, physical and social sciences. Panel (a) shows the proportions of papers and those of authors involved in dyadic interactions between the three science categories, and those proportions involved in interactions within each science category. Panel (b) shows the Rao-Sterling diversity in paper/author view of each science category, which measures the discipline diversity of interdisciplinary research.

3.6 Interdisciplinarity at author level

We analyzed the relationship between author degree/hyperdegree and the probability of doing interdisciplinary research, and the relationship between paper team size and the probability of being an interdisciplinary paper. Fig. 6 shows that in each science category, interdisciplinary research is not just carried out by authors with a large degree or those with a large hyperdegree.

Figure 6: The relationship between authors’ and papers’ specific indexes and their interdisciplinarity. Panels (a,b) show the relationship between author degree/hyperdegree and the probability of doing interdisciplinary research. Panel (c) shows the relationship between paper team size and the probability of being an interdisciplinary paper.

Fig. 6 also shows that large degree or hyperdegree authors are likely to engage in interdisciplinary research, and a paper with a large team size is likely to be an interdisciplinary one. It seems these phenomena can be expected at random. Take a set of elements (collaborators, papers) of several classes, and select a subset randomly. Then a larger subset more likely contains elements from more than one class. This reasoning, though plausible, is incorrect, because scientists do not randomly select topic and collaborators. Research costs (investments of time and effort) make scientists tend to work within their familiar fields. In addition, the reasoning is based on that the selection scope of collaborators is limited to empirical data, which does not hold in reality.

We analyzed the giant component of coauthorship network PNAS 1999-2013, which contains more than 86.8% authors. There are 71.5%, 76.7% and 88.9% authors of social, physical and biological sciences in the giant component (Fig. 7e). Note that the author misidentification caused by initial-based methods increases the size of the ground-truth giant componentKim1 . Hence we identified authors by their provided names on papers (which likely split one author into two) to obtain a conservative result.

Figure 7: Interdisciplinary extents of specific disciplines. For each considered discipline , we denote its authors, its authors involved in interdisciplinary research, its papers, and its interdisciplinary papers by sets , , and respectively. Denote the giant component of coauthorship network PNAS 1999-2013 by . The indexes are in Panel (a), in Panel (b), in Panel (c), in Panel (d), in Panel (e), and in Panel (f).

Interdisciplinary research and multidisciplinary research contribute to the giant component containing most authors of each science category. We analyzed the relationship between the author proportion of the giant component and author activity/productivity. Remove authors from high degree and hyperdegree to low respectively, and calculate the proportion of the giant component. From the relation curve between the proportion of removed authors and that of the giant component, we can find that the formation of giant component is contributed by a considerable number of authors, e. g. the top 10% authors ranked by degree (Fig. 8). Consider the relationship in three time periods, viz. 1999–2003, 2004–2008 and 2009–2013. The relation curve shifts to the left over time, which means author activity and productivity are playing increasingly important roles in the formation of the giant component.

Figure 8: The relationship between giant component size and degree/hyperdegree. Nodes are removed from high degree/hyperdegree to low respectively. For degree and hyperdegree respectively, the relation curves between the proportion of removed nodes and that of the giant component show that a considerable number of authors contribute to the formation of giant component. The left-shifting trend of the relation curves in three time periods (1999–2003, 2004–2008 and 2009–2013) over time shows the increasing contributions of author activity and productivity to the formation of the giant component.

4 Discussion and conclusions

Our case study on PNAS 1999-2013 verifies the similar transitivity and assortativity of collaboration patterns in biological, physical and social sciences. The data demonstrate that the degree distribution types of the three science categories are identical, which are a mixture of a generalized Poisson distribution and a power-law. This also holds for hyperdegree. We provided an explanation for the emergence of this distribution type through authors’ “yes/no” decisions and their different abilities to attract collaborations.

The data show that a considerable number of authors pursue interdisciplinary research, and the giant component of coauthorship network PNAS 1999-2013 contains most authors of each science category. We took network perspective to analyze the interactions among the second-class disciplines, and quantify their interdisciplinarity by network indexes such as degree and betweenness centrality. We found that specific second-class disciplines (such as Applied mathematics and Anthropology) play an important role in interdisciplinary research.

The case study contributes to understanding multidisciplinary and interdisciplinarity collaboration patterns, due to the importance of PNAS and to the accurate discipline information of its papers. The selection of data might affect the details of our findings about interdisciplinarity. Our results may not be interpreted as the patterns of general researchers. For example, we cannot expect to observe a high extent of interdisciplinarity by analyzing a domain specific journal. We finished the case study by asking a question: What are the grounds of interdisciplinary research? While a thorough discussion of this question is beyond the scope of this paper, the following provides a simple discussion.

There is a tendency of fragmentation for disciplines in the development of sciences: going to split into sub-disciplines and specific topics. Although the research objects are different, their research paradigms are in common, which can be grouped into four categories, namely theoretical research, experiment, simulation, and data-drivenHey . Meanwhile, many scientific problems are too complex to be understood through the methodology of single discipline. Integrating theoretical and methodological perspectives drawn from different disciplines creates a unified methodology for research problems and even vocabulary used to present concepts in specific disciplinesHaythornthwaite , which drives the formation of transdisciplinary disciplinesGrauwin .

Systems science, as a typical transdisciplinary discipline, studies systems from simple to complex, from natural to social sciences. The parts of a system and the relations between parts can be abstracted as networks. The rapid development of research on networks (model, algorithm,…) breeds a new discipline, namely network science. Some researchers from biological, physical and social fields investigate their respective problems under network frameworkBrier , e. g. our case study.

To follow up the above, one would think that common research paradigms and methodology, especially those integrated as transdisciplinary disciplines, give grounds for the interactions between science categories and for the formation of giant components in coauthorship networks. It seems promising that analyzing paper content helps to validate the universality of those paradigms and methodologies. Over half the papers of PNAS 1999-2013 contain the topic words “system” and “control”Xie4 . The high proportion of the papers containing a topic word at certain levels reflects the typicality of the topic. However, it is not easy to say which is the relation between a paper containing the word “system” and a paper applying research results of systems science. Hence validating the universality at semantic level is a subject for further study.

Availability of data and materials

The data are freely available from the their website Feel free to get in contact with the corresponding author in case you need more information.

Competing interests

The authors declare that they have no competing interests.


ZX acknowledges support from National Science Foundation of China (NSFC) Grant No. 61773020.

Authors’ contributions

All authors conceived and designed the research. ZX and ML wrote the paper. ZX and JPL analyzed the data. OYZZ acquired the data. ZX and XJD wrote the discussion. All authors discussed the research and approved the final version of the manuscript.


We thank the anonymous reviewers for their valuable suggestions and great help.


Wikipedia shows that people with major 100 Chinese surnames account for 84.77% of the total Chinese population.


  • [1] Weingart P (2012) A short history of knowledge formations. In R. Frodeman, J. Thompson Klein, & C. Mitcham (Eds.), The Oxford Handbook of Interdisciplinarity (pp. 3-14). Oxford, England: Oxford University Press.
  • [2] National Academies (U.S.). Committee on Facilitating Interdisciplinary Research (2004). Facilitating interdisciplinary research. Washington: National Academy Press. Retrieved from
  • [3] Hurd JM (1992) Interdisciplinary research in the sciences: Implications for library organizations. Coll Res Liber 53(4), 283-297.
  • [4] Cooper G (2013) A disciplinary matter: Critical sociology, academic governance and interdisciplinarity. Sociology 47(1), 74-89.
  • [5] Hadorn GH, Pohl C, Bammer G (2012) Solving problems through transdisciplinary research. In R. Frodeman, J. Thompson Klein, & C. Mitcham (Eds.), The Oxford Handbook of Interdisciplinarity (pp. 431-452). Oxford, England: Oxford University Press.
  • [6] Siedlok F, Hibbert P (2014) The organization of interdisciplinary research: Modes, drivers and barriers. Int J Manage Rev 16(2), 194-210.
  • [7] Liu Y, Rafols I, Rousseau R (2012) A framework for knowledge integration and diffusion. J Doc 68(1), 31-44.
  • [8] Gooch D, Vasalou A, Benton L (2017) Impact in interdisciplinary and cross-sector research: Opportunities and challenges. J Assoc Inf Sci Technol 68(2), 378-391.
  • [9] Lariviére V, Gingras Y, Archambault É (2006) Canadian collaboration networks: A comparative analysis of the natural sciences, social sciences and the humanities. Scientometrics 68(3): 519-533.
  • [10] Moody J (2004) The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2), 213-238.
  • [11] Glänzel W, Schoepflin U (1999) A bibliometric study of reference literature in the sciences and social sciences. Inform Process Manag 35(1): 31-44.
  • [12] Hicks D (1999) The difficulty of achieving full coverage of international social science literature and the bibliometric consequences. Scientometrics 44(2): 193-215.
  • [13]

    Sarigöl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F (2014) Predicting scientific success based on coauthorship networks. EPJ Data Science 2014:9.

  • [14] Barabási AL, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Physica A 311: 590-614.
  • [15] Newman M (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci USA 98: 404-409.
  • [16] Newman M (2004) Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA 101: 5200-5205.
  • [17] Xie Z, Ouyang ZZ, Li JP (2016) A geometric graph model for coauthorship networks. J Informetr 10: 299-311.
  • [18] Newman M (2002) Assortative mixing in networks. Phys Rev Lett 89: 208701.
  • [19] Tomasello MV, Vaccario G, Schweitzer F (2017) Data-driven modeling of collaboration networks: A cross-domain analysis. EPJ Data Science 6: 22.
  • [20] Braun T, Schubert A (2003) A quantitative view on the coming of age of interdisciplinarity in the sciences, 1980-1999. Scientometrics 58(1), 183-189.
  • [21] Levitt JM, Thelwall M, Oppenheim C (2011). Variations between subjects in the extent to which the social sciences have become more interdisciplinary. J Assoc Inf Sci Technol 62(6), 1118-1129.
  • [22] Porter AL, Roessner JD, Cohenm AS, Perreault M (2006). Interdisciplinary research: Meaning, metrics and nurture. Res Eval 15(3), 187-195.
  • [23] Porter AL, Rafols I (2009) Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics 81(3), 719-745.
  • [24] Rafols I, Meyer M (2010) Diversity and network coherence as indicators of interdisciplinarity: Case studies in bionanoscience. Scientometrics 82(2), 263-287.
  • [25] Abramo G, D’Angelo CA, Costa F (2012) Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications. J Assoc Inf Sci Technol 63(11): 2206-2222.
  • [26] Chen S, Arsenault C, Gingras Y, Lariviére V (2015) Exploring the interdisciplinary evolution of a discipline: The case of Biochemistry and Molecular Biology. Scientometrics 102(2), 1307-1323.
  • [27] Bordons M, Zulueta MA, Romero F, Barrigón S (1999) Measuring interdisciplinary collaboration within a university: The effects of the multidisciplinary research programme. Scientometrics 46(3), 383-398.
  • [28] Leydesdorff L, Goldstone RL (2014) Interdisciplinarity at the journal and specialty level: The changing knowledge bases of the journal Cognitive Science. J Assoc Inf Sci Technol 65(1), 164-177.
  • [29] Zhang L, Rousseau R, Glänzel W (2015) Diversity of references as an indicator for interdisciplinarity of journals: Taking similarity between subject fields into account. J Assoc Inf Sci Technol 67(5), 1257-1265.
  • [30] Lungeanu A, Huang Y, Contractor NS (2014) Understanding the assembly of interdisciplinary teams and its impact on performance. J Informetr 8(1), 59-70.
  • [31] Lariviére V, Gingras, Y (2010) On the relationship between interdisciplinarity and scientific impact. J Assoc Inf Sci Technol 61(1), 126-131.
  • [32] Lariviére V, Haustein S, Börner K (2015) Long-distance interdisciplinarity leads to higher scientific impact. Plos One 10(3), e0122565.
  • [33] Rinia EJ, van Leeuwen TN, van Raan AFJ (2002) Impact measures of interdisciplinary research in physics. Scientometrics 53(2), 241-248.
  • [34] Wan J, Thijs B, Glänzel W (2015) Interdisciplinarity and impact: Distinct effects of variety, balance, and disparity. Plos One 10(5), e0127298.
  • [35] Levitt JM, Thelwall M (2009) The most highly cited library and information science articles: interdisciplinarity, first authors and citation patterns. Scientometrics 78(1), 45-67.
  • [36] Levitt JM, Thelwall M (2008) Is multidisciplinary research more highly cited?: A macrolevel study. J Assoc Inf Sci Technol 59(12), 1973-1984.
  • [37] Chen S, Arsenault C, Lariviére V (2015) Are top-cited papers more interdisciplinary? J Informetr 9(4): 1034-1046.
  • [38] Stirling A (2007) A general framework for analyzing diversity in science, technology and society. J Roy Soc Interf 4(5), 707-719.
  • [39] Leydesdorff L (2007) Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. J Assoc Inf Sci Technol 58(9), 1303-1319.
  • [40] Van den Besselaar P, Heimeriks G (2001, July). Disciplinary, multidisciplinary, interdisciplinary: Concepts and indicators. In ISSI (pp. 705-716).
  • [41] Kagan J. The three cultures: Natural sciences, social sciences, and the humanities in the 21st century. Cambridge University Press, 2009.
  • [42] Xie Z, Duan XJ, Ouyang ZZ, Zhang PY (2015) Quantitative analysis of the interdisciplinarity of applied mathematics. Plos One 10(9): e0137424.
  • [43] Milojević S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. J Informetr 7(4): 767-773.
  • [44] Kim J, Diesner J (2016) Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. J Assoc Inf Sci Technol 67(6):1446-1461.
  • [45] Milojević S (2010) Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. J Assoc Inf Sci Technol 61(7): 1410-1423.
  • [46] Xie Z, Ouyang ZZ, Li JP, Dong EM, Yi DY (2018) Modelling transition phenomena of scientific coauthorship networks. J Assoc Inf Sci Technol 69(2): 305-317.
  • [47] Consul PC, Jain GC (1973) A generalization of the Poisson distribution. Technometrics 15(4): 791-799.
  • [48] Xie Z, Xie ZL, Li M, Li JP, Yi DY (2017) Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 112, 483-507.
  • [49] Levitt JM, Thelwall M, Oppenheim C (2011) Variations between subjects in the extent to which the social sciences have become more interdisciplinary. J Assoc Inf Sci Technol 62(6), 1118-1129.
  • [50] Hey T, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery, Microsoft research, Redmond, Washington.
  • [51] Haythornthwaite C (2006). Learning and knowledge networks in interdisciplinary collaborations. J Assoc Inf Sci Technol 57(8), 1079-1092.
  • [52] Grauwin S, Beslon G, Éric Fleury, Franceschelli S, Robardet C, Rouquier JB, Jensen P (2012) Complex systems science: dreams of universality, interdisciplinarity reality. J Assoc Inf Sci Technol 63(7), 1327-1338.
  • [53] Brier S (2013) Cybersemiotics: a new foundation for transdisciplinary theory of information, cognition, meaningful communication and the interaction between nature and culture. Integr Rev 9: 222-263.

5 Appendix

The following boundary detection algorithms come from Reference Xie6 .

Input: Observations (), rescaling function , and fitting model .
For from to do:
    Fit to the PDF of by maximum-likelihood
    Do KS test for two data and ,
with the null hypothesis they coming from the same distribution;
    Break if the test rejects the null hypothesis at significance level .
Output: The current as the boundary point.
Table 6: A boundary detection algorithm for probability density functions.
Input: Data vector (), rescaling funtion , and fitting model .
For from to do:
    Fit to , by regression;
    Do KS test for two data vectors and , with the null
hypothesis they coming from the same distribution;
    Break if the test rejects the null hypothesis at significance level .
Output: The current as the boundary point.
Table 7: Boundary point detection algorithm for general functions.