1 Introduction
Natural and social sciences provide methodical approaches to study, predict and explain natural phenomena and sociality (human behaviors and psychological states) respectivelyWeingart . The specialization of knowledge in these sciences forms various disciplines. Meanwhile, to solve problems whose solutions are beyond the scope of a single discipline, researchers need to integrate data, techniques, concepts, and theories from several disciplinesCooper , Hurd , National , Hirsch . Interactions between disciplines incubate several interdisciplines, fuzz the boundary of natural and social sciences, and produce many important scientific breakthroughsLiu , Siedlok , Gooch .
Studying collaboration patterns within and across disciplines or sciences contributes to understand the diversity of cooperative behaviors and fusion modes of knowledge. Papers of multidisciplinary journals provide an informative and reliable platform for this studying, because the media of natural and social sciences mainly count on papersLariviere2 , Moody , Glanzel , Hicks . Here we investigated the patterns based on 52,803 papers published in Proceedings of the National Academy of Sciences (PNAS) over the years 19992013. The content of dataset spans three science categories: social sciences and two principal subsciences in natural sciences, viz. biological and physical sciences.
Collaboration relationship can be expressed by graphs, termed as coauthorship networks. Hence the patterns can be studied in network perspective. Coauthorship networks from different scientific fields appear specific similarities, such as partial transitivity of coauthorship, homophily on the number of collaborators, the rightskewed distribution of collaborators per author
Newman1 , Newman4 , Barab , Newman0 , Sarigol , Xie3 , Tomasello. These commonalities also appear in the collaboration networks of three author sets (which come from the three science categories of PNAS respectively). We dived more into the rule and reason of these commonalities. We found that the distribution of collaborators per author and that of papers per author follow the same distribution type: a mixture of a generalized Poisson distribution and a powerlaw. We provided a possible explanation for the distribution type and these commonalities through the diversity of author abilities to attract collaborations.
A range of previous works discussed quantitative indexes of interdisciplinarity for sciencesBraun , Porter1 , Levitt2 , for disciplinesPorter , Chen , Rafols , Abramo , for universitiesBordons , for journalsLeydesdorff , Zhang , and for research teamsLungeanu . Some works addressed the correlation between interdisciplinarity and scientific impactLariviere , Lariviere1 , Rinia , Wang (e. g. citation catching abilityLevitt1 , Levitt , Chen1 ). Based on specific general ideas of these references, we studied interdisciplinary activities of PNAS through paper cooccurrence of disciplines, and through some indexes calculated based on the cooccurrence, such as RaoSterling diversityStirling , and betweenness centralityLeydesdorff2007 .
We further studied the collaboration patterns across disciplines, and found that a considerable proportion of authors and papers in physical and social sciences involved in interdisciplinary research. The multidisciplinary coauthorship network extracted from the data has a giant component, which contains more than 88%, 80% and 71% authors in biological, physical and social sciences respectively. A considerable number of authors contribute to the formation of giant component. The contributions of author activity and productivity to the formation increase over time. The high extent of interdisciplinarity shown by the case study might not be representative of general collaboration patterns, because authors could submit more interdisciplinary work to multidisciplinary journals than domain specific ones.
This report is structured as follows: the data processing is described in Section 2; the similarities and interactions are analyzed in Section 3; and the discussion and conclusion are drawn in Section 4.
2 The Data
2.1 Reason for using the data
The case study involves two concepts, namely multidisciplinarity (researchers from different disciplines study within their disciplines) and interdisciplinarity (study beyond disciplinary boundaries)Besselaar . Multidisciplinarity could be viewed as a combination of disciplines, and interdisciplinarity as a merging of them. A multidisciplinary journal with the scope covering natural and social sciences can be utilized to analyze the interactions between science categories. Such journal can be also utilized to compare the collaboration patterns of multidisciplines and find similarities. PNAS publishes high quality research papers, and provides reliable discipline information of those papers. The journal also provides a high quality data platform for analyzing worldwide collaboration patterns, because nearly half of its papers come from authors outside the United States.
Multidiscipline journals: Science, Nature and Nature Communications do not provide discipline information of papers. Journal of the Royal Society Interface focuses on the crossdisciplinary research at the interface between the physical and life sciences, but does not involve social sciences. Our analysis is restricted to PNAS, which brings limitations to our findings. For example, the media of social sciences not only count on papers, but also on booksGlanzel , Hicks . Hence the results obtained must be carefully interpreted as being the patterns of researchers who publish papers in the chosen journal. However, due to the influence and representability of PNAS, the case study could contribute to understanding aspects of multidisciplinary and interdisciplinary collaboration patterns.
2.2 Discipline information
Most papers of the dataset have been classified into three firstclass disciplines (biological, physical, and social sciences) and
secondclass disciplines (Table 1). Interdisciplinary papers are classified into several disciplines. The data contain 43,304 biological papers (including 3,957 papers of biophysics), which account for 82.01% of the total. The data also contain 5,987 physical papers and 1,310 social papers. There are 2,961 interdisciplinary papers belonging to more than one of the secondclass disciplines, which account for of the total. The significant difference of discipline proportion does not mean the preference for PNAS. In reality, the number of researchers involved in natural sciences (especially, biological sciences) is far more than that of researchers involved in social sciencesKagan . There are 1,842 papers that are only classified into the firstclass disciplines. For these papers, their secondclass discipline are regarded to be missing, but which have been regarded to be the same as their firstclass disciplines in our pervious workXie4 . Hence the data in Table 1 are different from those in Reference Xie4 .Based on the discipline information of papers, we constructed a network to express the relationship between the firstclass and the secondclass disciplines (Fig. 1), where two disciplines are connected if they are the firstclass and the secondclass disciplines of a paper. We can also construct a network to express the interactions between the secondclass disciplines (Fig. 2), where each node is a discipline and two nodes are connected if there is a paper belonging to them simultaneously. These networks could evolve with the discipline information of newly published papers. So using the latest data, one may have a more comprehensive view.
Disciplinary  
Agricultural science  22  226  9  20  3.19 
Anthropology  114  556  24  110  40.02 
Applied biological science  135  767  9  134  1.79 
Applied mathematics  191  380  27  182  49.39 
Applied physical science  309  816  26  299  29.14 
Astronomy  3  50  3  3  0.13 
Biochemistry  333  6,303  19  327  16.96 
Biophysics  359  3,957  16  359  7.91 
Biophysics and computational biology  468  1,532  11  467  7.95 
Cell biology  135  3,717  18  130  12.71 
Chemistry  1,003  8,645  26  1,003  49.73 
Computer science  77  101  17  70  9.50 
Developmental biology  33  1,525  12  30  1.66 
Earth, atmospheric, and planetary sciences  78  243  9  77  1.58 
Ecology  162  1,084  15  162  10.00 
Economic science  94  171  21  94  20.88 
Engineering  217  392  19  217  13.85 
Environmental science  184  695  20  183  25.44 
Evolution  233  2,274  22  216  25.81 
Genetics  103  2,664  20  97  12.68 
Geology  137  285  10  136  2.79 
Geophysics  23  175  7  23  1.51 
Immunology  43  3,070  10  38  1.45 
Mathematics  18  561  11  17  3.36 
Medical science  181  4784  20  170  14.01 
Microbiology  92  2,812  17  89  11.85 
Neurobiology  16  1,003  9  16  0.87 
Neuroscience  290  4,398  16  280  12.00 
Pharmacology  26  594  4  26  0.08 
Physics  229  4,818  22  227  18.24 
Physiology  33  1,116  12  32  5.82 
Plant biology  27  1,700  12  27  4.62 
Political science  7  17  5  7  0.54 
Population biology  27  166  11  26  4.04 
Psychological and cognitive science  160  487  16  159  5.09 
Psychology  83  449  12  83  3.62 
Statistics  90  146  20  85  19.34 
Sustainability science  123  399  11  120  7.66 
Systems biology  36  159  11  36  1.80 
The number of papers and that of interdisciplinary papers of a discipline are counted based on the discipline information provided by PNAS. The degree , weighted degree , and betweenness centrality of a discipline are calculated based on the weighted network in Fig. 2.
2.3 Coauthorship
Identifying groundtruth authors, termed as disambiguating author names, is an important, timeconsuming, but a necessary procedure of coauthorship analysis. Several methods use the information of the provided names on papers (e. g. initial based methodsMilojevic2 ). The dominant misidentification of initial based methods is caused by merging two or more different authors as one. Hence, it deflates the number of unique authors, and inflates the size of the groundtruth giant component. Requiring additional information (e. g. email address) helps to reduce merging errors, but brings the difficulty of collecting information.
In PNAS 19992013, authors provide full first name. So the provided names on papers are directly used to identify authors. However, utilizing surname and the initial of the first given name will generate a lot of merging errors of name disambiguationKim1 . The proportion of these authors in the data is , and the proportion of these authors further conditioned on publishing more than one paper is . Meanwhile, even utilizing full names still produces merging errors, if some authors provide exactly the same name. Chinese names were found to account for name repetitionKim1 . We calculated the proportion of the names with a given name less than six characters and a surname among major 100 Chinese surnames. The proportion of these authors in the data is , and that of these authors further conditioned on publishing more than one paper is . The small values of these four proportions show that the impact of name repetition is limited. These proportions for specific subsets of the data are listed in Table 2.
Data  

PNAS 19992013  2.9%  0.3%  2.7%  1.1% 
Biological sciences  2.7%  0.2%  2.7%  1.1% 
Physical sciences  4.8%  0.4%  4.4%  0.9% 
Social sciences  2.3%  0.1%  2.2%  0.3% 
Biophysics  4.1%  0.3%  4.0%  1.0% 
Interdiscipline  2.6%  0.1%  3.6%  0.6% 
Indexes and are the proportion of the authors only providing the initial of their first given name and their surname, and that of these authors further conditioned on publishing more than one paper respectively. Indexes and are the proportion of the authors with a surname among the major 100 Chinese surnames and a given name less than six characters, and that of these authors further conditioned on publishing more than one paper respectively.
The method adopted here will split one author as two or more, if the author does not provide his name consistently. Splitting underestimates the giant component size, and the indexes used as evidences for universality of interdisciplinary research. Hence the results in Subsection 3.5, 3.6 could be regarded as conservative ones. In addition, the inaccuracy caused by the adopted method does not change the ground truth distribution type of collaborators per author and that of papers per authorKim1 .
3 Data analysis
3.1 Network properties
Coauthorship is a ary relation, , hence it can be expressed by a hypergraph, a generalization of a graph in which an edge (termed as hyperedge) can join any number of nodes. Represent authors as nodes, and the author group of each paper (paper team) as a hyperedge. Then we can extract a coauthorship network from a hypergraph as a simple graph, where edges are formed between every two nodes in each hyperedge, and the multiple edges are treated as one. The terms “degree” and “hyperdegree” for nodes are used to express the number of collaborators and that of papers for authors respectively.
The data show that the average paper team size of biological sciences (6.624) and that of physical sciences (5.254) are larger than that of social sciences (4.634). The size relation fits the reality that the sizes of research teams are usually larger in natural sciences, and smaller in social sciencesKagan . Now let us consider the coauthorship networks of the considered papers in specific disciplines or science categories. All of these networks are highly clustered, assortative, and their average shortest path length scale as the logarithms of their number of nodes (NNAP in Table 3). These properties do not mean all of the networks are smallworld. The network of social sciences is an exception, which even has no component containing more than 10% authors. However, it does not mean that the research in social sciences is carried out in isolation. In fact, 71.5% authors in social sciences belong to the giant component of the coauthorship network generated by the whole data. Therefore, analyzing the collaborations of authors restricting in single discipline has limitations. So we proceeded the analysis in the environment of all disciplines.
Network  NN  NE  GCC  AC  AP  PG 

PNAS 19992013  202,664  1,225,176  0.881  0.230  6.422  0.868 
Biological sciences  184,872  1,150,362  0.881  0.232  6.364  0.880 
Physical sciences  24,766  101,166  0.933  0.452  10.89  0.455 
Social sciences  5,121  18,786  0.946  0.683  6.574  0.087 
Biophysics  13,480  48,012  0.905  0.177  7.665  0.636 
Interdiscipline  13,680  53,588  0.951  0.558  9.397  0.093 
The indexes are the number of nodes (NN), the number edges (NE), global clustering coefficient (GCC), degree assortativity coefficient (AC), average shortest path length (AP), the node proportion of the giant component (PG). The AP of the first two networks are approximately calculated by sampling 400,000 pairs of nodes.
3.2 Degree and hyperdegree
Aggregate degree and hyperdegree on the data (not restricted in single science category), and observe the degree distributions and hyperdegree distributions of three author sets (which come from the three science categories respectively). We found that although collaboration level differs from one science category to another, all of the distributions emerge a hook head, a fat tail, and a crossover between them, which could be viewed as a common feature of coauthorship networks (Fig. 3
). The head and tail can be fitted by lognormal distribution and powerlaw distribution respectively
Milojevic1 .These distributions can also be fitted, as a whole, by a mixture of a generalized Poisson distribution and a powerlaw distribution. The fitting parameters are listed in Table 4
. We performed a twosample KolmogorovSmirnov (KS) test to compare the distributions of two data vectors: node indexes (i. e. degrees, hyperdegrees), the samples drawn from the corresponding fitting distribution. The null hypothesis is that the two data vectors are from the same distribution. The
value of each fitting shows the test cannot reject the null hypothesis at 5% significance level. Note that goodnessoffit test is not suitable here, due to the small number of large degree authors.Regarding authors as samples, a mixture distribution means those samples come from different populations, namely the collaboration patterns of the authors with few collaborators and papers differ from those with many collaborators and papers. In ReferenceXie6 , a possible explanation (which is free of disciplines) is given for the emerged mixture type of empirical degree distributions. With the same general ideas, a similar explanation can be adopted for hyperdegree distributions as follows.
The event whether a researcher collaborates with one another to publish a paper can be regarded as a “yes/no” decision. So the hyperdegree of a researcher is equal to the number of successes in a sequence of decisions made by the candidates who want to coauthor with that researcher. Suppose the number of those candidates to be
. Suppose the collaboration probability of each candidate to be
. Then, the hyperdegrees will follow a binomial distribution
, and so a Poisson distribution with expected value approximatively (Poisson limit theorem). The value of varies from author to author, due to the diversity of authors’ ability to attract collaborators.Decisions of authors could be dependent. For example, collaborating with the researchers who have publishing experience helps to publish a paper. Hence we could regard hyperdegree as a random variable following a generalized Poisson distribution (which allows the occurrence probability of an event to involve memory
Consul ). In empirical data, most hyperdegrees are around their mode. Hence we could think of that they follow some generalized Poisson distributions with an expected value around their mode, and so form the generalized Poisson part of a hyperdegree distribution. A few authors experience a cumulative process of papers, which makes a hyperdegree distribution skew to the right and form a fat tail.Degree distribution  value  

Biological sciences  4.843  0.464  74.27  2.889  1.049  15  26  20  50  0.203 
Physical sciences  3.958  0.477  49.31  2.798  1.037  7  14  20  53  0.178 
Social sciences  3.292  0.513  20.78  2.657  1.046  11  28  20  35  0.111 
Hyperdegree distribution  value  
Biological sciences  0.028  0.269  1.968  3.099  35.57  2  6  10  13  0.979 
Physical sciences  0.021  0.320  2.977  2.916  47.15  2  11  10  10  0.625 
Social sciences  0.022  0.375  19.48  3.665  46.24  3  20  10  11  0.206 
The ranges of generalized Poisson , crossover, and powerlaw are , , and respectively. The fitting function is , where . The fitting processes are: obverse proper and ; calculate parameters of (i.e. , , ) and (i.e. , ) through regressing the empirical distribution in and respectively; find and through exhaustion to make pass KS test (value). The sum of each over is near unity, which means that
can be regarded as a probability density function.
3.3 Transitivity of coauthorship
Transitivity in society is that “the friend of my friend is also my friend”, which is a typical feature of social affiliation networks. In academic society, collaborators of an author likely acquaint and so coauthor with each other. For example, organizational and institutional contexts drive the formation of transitive coauthorship, and so contribute to clustering structures emerging in coauthorship networks.
The transitivity of a network can be quantified by two indexes in graph theory, namely global clustering coefficient (the fraction of connected triples of nodes which also form “triangles”) and local clustering coefficient (the probability of a node’s two neighbors connecting). High transitivity is a common feature of coauthorship networksNewman1 .
To what extent the transitivity is due to the activity of authors in academic society? The activity can be partly reflected through the number of collaborators, namely degree. Hence, the extent can be sketched through the correlation coefficients between degree and local clustering coefficient. Note that the correlation coefficients indicate the extent of a linear relationship between two variables or their ranks. The coefficients of variables and generally do not completely characterize correlation, unless the conditional expected value of given , denoted by , is linear or approximate linear function in . The conditional expected value of local clustering coefficient given degree is the average local clustering coefficient of degree nodes, denoted by CC. The approximatively linear trend of CC shown in Fig. 4 guarantees the effectiveness of correlation analysis in Table 5. The decreasing trend cannot be deduced out from degree information. The denominator of the local clustering coefficient of a node grows quadratically with its degree, but the numerator cannot be calculated from degree information.
Discipline  Indicator  Mean  Std  SCC  PCC 

LCC  0.860  0.398  0.401  
Biological sciences  LTC  0.001  0.005  0.275  0.077 
DN  21.09  0.543  0.400  
HN  3.015  15.47  0.070  0.046  
LCC  0.806  0.336  0.382  
Physical sciences  LTC  0.001  0.005  0.306  0.074 
DN  15.48  0.625  0.346  
HN  2.682  12.44  0.169  0.015  
LCC  0.784  0.177  0.263  
Social sciences  LTC  0.001  0.006  0.292  0.050 
DN  12.87  0.723  0.482  
HN  2.268  10.89  0.175  0.030 
The indexes are local clustering coefficient (LCC), the local transitivity of collaboration (LTC), the average degree of node neighbors (DN), the average hyperdegree of node neighbors (HN). We calculated the mean of these indexes over authors, the Spearman rank correlation coefficient (SCC) and Pearson productmoment correlation coefficient (PCC) between each index and degree. For the two indexes with small PCC, we calculated their standard deviation (Std).
Does the decreasing trend of CC mean activity depresses transitivity? A positive answer to it is against common sense. In PNAS 19992013, 74.62% authors only publish one paper in the data, and the paper team sizes of 99.9% papers follow a generalized Poisson part, namely are around the average paper team size 6.028. The boundary of generalized Poisson part is detected by the boundary point detection algorithm for probability density functions in ReferenceXie6 (listed in Appendix). Hence the local clustering coefficients of most small degree authors are close to 1 (Fig. 4). A few authors experience a long period of collaborations, whose degree is obtained by accumulated over papers. For these authors, their collaborators in different papers could not collaborate, which decreases their local clustering coefficient. Hence the puzzling thing does not contradict with common sense, but is due to insufficiency of measuring transitivity such a dynamical property by counting “triangles” on a static network.
To design a more reasonable index measuring transitivity, let us come back to the original meaning of transitivity on coauthorship: the probability of two collaborators (who do not coauthor yet) of a researcher coauthoring in future. The probability can be calculated for dynamic hypergraphs of collaborations through time information. Averaging the probability over authors measures the global transitivity, the value of which is quite low in each science category (Table 5). Note that the calculation is limited in PNAS 19992013, and transitivity may happen in other journals or in other time period. So the values of transitivity here may be underestimated. The increasing trend of the transitivity probability of degree authors (TC in Fig. 4) means the activity contributes to transitivity. It fits common sense: a researcher with many collaborators is likely to introduce his collaborators to cooperate.
3.4 Homophily of coauthorship
Coauthorship is based on specific features of researchers in common, including interest and geography. The homophily phenomenon appears in many social relations, and is called assortative mixing in network science Newman4 . Do authors of each science category prefer to coauthor with others that are similar in social activity or productivity? The social activity and productivity of authors can be quantified by two indexes, namely degree and hyperdegree respectively. Then the preference of an index could be sketched through the correlation coefficient between two variables, namely the index of a author and the average index of the author’s neighbors. Positive correlation means assortative, negative disassortative, and zero no preference.
Degree assortativity is a feature of coauthorship networks Newman4 . Does it mean sociable researchers (with many collaborators) will preferentially coauthor with other sociable researchers, and unsociable to unsociable? In a previous study Xie7 , we showed that the proportion of top 5.99% most sociable authors (measured according to degree) having coauthored with another such author is 99.5%. The proportion may even be underestimated, because these authors probably coauthored before 1999 or in other situations. Note that the splitting and merging errors of the used name disambiguation method affect the proportion at certain levels. Even so, the proportion is still remarkable.
However, if sociable researchers only coauthor with sociable ones, then there will exist many sociable researchers, which is against empirical degree distributions. Now let us analyze the influence of the social activity of authors on degree assortativity. For the authors with degree, denote the average degree of their neighbors by DN. There exists a trend change in DN of each empirical dataset: the head part has a clear increasing trend, but the tail part does not (Fig. 4). It means that degree assortativity are mainly contributed by small degree authors.
The tipping point of the trend of DN is detected by the boundary point detection algorithm for general functions in ReferenceXie6 (listed in Appendix). Inputs of the algorithm are DN, and (, , ). Using those inputs is based on the observation of DN. Degrees of most authors are around their mode , and only a few authors have a large degree. Hence the neighbors of an author are likely to be small degree authors. Therefore, for small degree authors, the degree differences between those authors and their neighbors are small, and large for large degree authors, which leads to the trend change of DN.
The correlation coefficient between hyperdegree and the average hyperdegree of neighbors is around zero in each science category (Table 5). For the authors with hyperdegree, denote the average hyperdegree of their neighbors by HN. It means choosing collaborators is free of the factor of productivity. In reality, members of a research team may have various scientific ages (newcomers, incumbents), so different hyperdegrees. Since collaborations mainly happen in a research team, collaborators of an author could have various hyperdegrees, which appears as the stable trend of HN.
Based on the average value of HN larger than , and 74.62% authors only having one paper in the data, we can derive that a large fraction of authors collaborate with at least one author who has published a paper in PNAS 1999–2013 to publish their first paper in the data. The proportions of these authors are 79.22%, 71.17% and 65.12% in biological, physical and social sciences respectively. The proportions may be overestimated, because some of these authors may publish papers in PNAS before 1999.
3.5 Interdisciplinarity at discipline level
The cocategory proportion measures the activities of interdisciplinary research. There are 49.2%, 46.0% and 7.3% authors of social, physical and biological sciences who published interdisciplinary papers. The common sense suggests that social scientists engage in research solitary. The proportion of social sciences shows that the common sense does not hold in PNAS. RefereneLevitt3 also shows, there has been a move towards increased interdisciplinarity in recent decades in social sciences.
Above analysis process could be implemented to the secondclass disciplines to obtain a highresolution result. However some disciplines only have a few papers, e. g. only 17 papers of political science. So the analysis for those disciplines loses statistical meaning. Hence we took another perspective to analyze the interactions among the secondclass disciplines by visualizing them as the network in Fig. 2. The network is connected, i. e. no discipline is isolated. Top three nodes of this network in terms of degrees and those in terms of betweenness centralities are Applied mathematics, Chemistry and Anthropology (Table 1). It means the theories, methods and problems of those disciplines are directly or indirectly used or studied by many disciplines. For each firstclass discipline, we contracted its secondclass disciplines as one node, and calculated the betweenness centrality of the contracted node. Their betweenness centrality (Biological sciences 47.51, Physical sciences 163.81, Social sciences 161.72) support the above analysis.
The cocategory proportion only describes interdisciplinary activities. Now let us measure the discipline diversity of interdisciplinary research in each science category through RaoSterling indexStirling , where and are proportional representations of the papers/authors in science category and and is the level of difference attributed to categories and . Discipline information is used to classify authors into science categories: if one of his papers belongs to a discipline, an author can be classified into the discipline, so into the corresponding sciences. Note that an author can be classified into several science categories, if his papers belong to more than one discipline. Here we let for all and , hence the calculated RaoSterling index measures the balanceweighted variety of interdisciplinary research in the level of science categories. The index in author view and that in paper view show that the discipline diversity of interdisciplinary research in social sciences and that in physical sciences are much higher than that in biological sciences (Fig. 5).
3.6 Interdisciplinarity at author level
We analyzed the relationship between author degree/hyperdegree and the probability of doing interdisciplinary research, and the relationship between paper team size and the probability of being an interdisciplinary paper. Fig. 6 shows that in each science category, interdisciplinary research is not just carried out by authors with a large degree or those with a large hyperdegree.
Fig. 6 also shows that large degree or hyperdegree authors are likely to engage in interdisciplinary research, and a paper with a large team size is likely to be an interdisciplinary one. It seems these phenomena can be expected at random. Take a set of elements (collaborators, papers) of several classes, and select a subset randomly. Then a larger subset more likely contains elements from more than one class. This reasoning, though plausible, is incorrect, because scientists do not randomly select topic and collaborators. Research costs (investments of time and effort) make scientists tend to work within their familiar fields. In addition, the reasoning is based on that the selection scope of collaborators is limited to empirical data, which does not hold in reality.
We analyzed the giant component of coauthorship network PNAS 19992013, which contains more than 86.8% authors. There are 71.5%, 76.7% and 88.9% authors of social, physical and biological sciences in the giant component (Fig. 7e). Note that the author misidentification caused by initialbased methods increases the size of the groundtruth giant componentKim1 . Hence we identified authors by their provided names on papers (which likely split one author into two) to obtain a conservative result.
Interdisciplinary research and multidisciplinary research contribute to the giant component containing most authors of each science category. We analyzed the relationship between the author proportion of the giant component and author activity/productivity. Remove authors from high degree and hyperdegree to low respectively, and calculate the proportion of the giant component. From the relation curve between the proportion of removed authors and that of the giant component, we can find that the formation of giant component is contributed by a considerable number of authors, e. g. the top 10% authors ranked by degree (Fig. 8). Consider the relationship in three time periods, viz. 1999–2003, 2004–2008 and 2009–2013. The relation curve shifts to the left over time, which means author activity and productivity are playing increasingly important roles in the formation of the giant component.
4 Discussion and conclusions
Our case study on PNAS 19992013 verifies the similar transitivity and assortativity of collaboration patterns in biological, physical and social sciences. The data demonstrate that the degree distribution types of the three science categories are identical, which are a mixture of a generalized Poisson distribution and a powerlaw. This also holds for hyperdegree. We provided an explanation for the emergence of this distribution type through authors’ “yes/no” decisions and their different abilities to attract collaborations.
The data show that a considerable number of authors pursue interdisciplinary research, and the giant component of coauthorship network PNAS 19992013 contains most authors of each science category. We took network perspective to analyze the interactions among the secondclass disciplines, and quantify their interdisciplinarity by network indexes such as degree and betweenness centrality. We found that specific secondclass disciplines (such as Applied mathematics and Anthropology) play an important role in interdisciplinary research.
The case study contributes to understanding multidisciplinary and interdisciplinarity collaboration patterns, due to the importance of PNAS and to the accurate discipline information of its papers. The selection of data might affect the details of our findings about interdisciplinarity. Our results may not be interpreted as the patterns of general researchers. For example, we cannot expect to observe a high extent of interdisciplinarity by analyzing a domain specific journal. We finished the case study by asking a question: What are the grounds of interdisciplinary research? While a thorough discussion of this question is beyond the scope of this paper, the following provides a simple discussion.
There is a tendency of fragmentation for disciplines in the development of sciences: going to split into subdisciplines and specific topics. Although the research objects are different, their research paradigms are in common, which can be grouped into four categories, namely theoretical research, experiment, simulation, and datadrivenHey . Meanwhile, many scientific problems are too complex to be understood through the methodology of single discipline. Integrating theoretical and methodological perspectives drawn from different disciplines creates a unified methodology for research problems and even vocabulary used to present concepts in specific disciplinesHaythornthwaite , which drives the formation of transdisciplinary disciplinesGrauwin .
Systems science, as a typical transdisciplinary discipline, studies systems from simple to complex, from natural to social sciences. The parts of a system and the relations between parts can be abstracted as networks. The rapid development of research on networks (model, algorithm,…) breeds a new discipline, namely network science. Some researchers from biological, physical and social fields investigate their respective problems under network frameworkBrier , e. g. our case study.
To follow up the above, one would think that common research paradigms and methodology, especially those integrated as transdisciplinary disciplines, give grounds for the interactions between science categories and for the formation of giant components in coauthorship networks. It seems promising that analyzing paper content helps to validate the universality of those paradigms and methodologies. Over half the papers of PNAS 19992013 contain the topic words “system” and “control”Xie4 . The high proportion of the papers containing a topic word at certain levels reflects the typicality of the topic. However, it is not easy to say which is the relation between a paper containing the word “system” and a paper applying research results of systems science. Hence validating the universality at semantic level is a subject for further study.
Availability of data and materials
The data are freely available from the their website http://www.pnas.org. Feel free to get in contact with the corresponding author in case you need more information.
Competing interests
The authors declare that they have no competing interests.
Funding
ZX acknowledges support from National Science Foundation of China (NSFC) Grant No. 61773020.
Authors’ contributions
All authors conceived and designed the research. ZX and ML wrote the paper. ZX and JPL analyzed the data. OYZZ acquired the data. ZX and XJD wrote the discussion. All authors discussed the research and approved the final version of the manuscript.
Acknowledgments
We thank the anonymous reviewers for their valuable suggestions and great help.
Endnotes
Wikipedia shows that people with major 100 Chinese surnames account for 84.77% of the total Chinese population.
References
 [1] Weingart P (2012) A short history of knowledge formations. In R. Frodeman, J. Thompson Klein, & C. Mitcham (Eds.), The Oxford Handbook of Interdisciplinarity (pp. 314). Oxford, England: Oxford University Press.
 [2] National Academies (U.S.). Committee on Facilitating Interdisciplinary Research (2004). Facilitating interdisciplinary research. Washington: National Academy Press. Retrieved from http://www.nap.edu/books/0309094356/html/
 [3] Hurd JM (1992) Interdisciplinary research in the sciences: Implications for library organizations. Coll Res Liber 53(4), 283297.
 [4] Cooper G (2013) A disciplinary matter: Critical sociology, academic governance and interdisciplinarity. Sociology 47(1), 7489.
 [5] Hadorn GH, Pohl C, Bammer G (2012) Solving problems through transdisciplinary research. In R. Frodeman, J. Thompson Klein, & C. Mitcham (Eds.), The Oxford Handbook of Interdisciplinarity (pp. 431452). Oxford, England: Oxford University Press.
 [6] Siedlok F, Hibbert P (2014) The organization of interdisciplinary research: Modes, drivers and barriers. Int J Manage Rev 16(2), 194210.
 [7] Liu Y, Rafols I, Rousseau R (2012) A framework for knowledge integration and diffusion. J Doc 68(1), 3144.
 [8] Gooch D, Vasalou A, Benton L (2017) Impact in interdisciplinary and crosssector research: Opportunities and challenges. J Assoc Inf Sci Technol 68(2), 378391.
 [9] Lariviére V, Gingras Y, Archambault É (2006) Canadian collaboration networks: A comparative analysis of the natural sciences, social sciences and the humanities. Scientometrics 68(3): 519533.
 [10] Moody J (2004) The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2), 213238.
 [11] Glänzel W, Schoepflin U (1999) A bibliometric study of reference literature in the sciences and social sciences. Inform Process Manag 35(1): 3144.
 [12] Hicks D (1999) The difficulty of achieving full coverage of international social science literature and the bibliometric consequences. Scientometrics 44(2): 193215.

[13]
Sarigöl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F (2014) Predicting scientific success based on coauthorship networks. EPJ Data Science 2014:9.
 [14] Barabási AL, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Physica A 311: 590614.
 [15] Newman M (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci USA 98: 404409.
 [16] Newman M (2004) Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA 101: 52005205.
 [17] Xie Z, Ouyang ZZ, Li JP (2016) A geometric graph model for coauthorship networks. J Informetr 10: 299311.
 [18] Newman M (2002) Assortative mixing in networks. Phys Rev Lett 89: 208701.
 [19] Tomasello MV, Vaccario G, Schweitzer F (2017) Datadriven modeling of collaboration networks: A crossdomain analysis. EPJ Data Science 6: 22.
 [20] Braun T, Schubert A (2003) A quantitative view on the coming of age of interdisciplinarity in the sciences, 19801999. Scientometrics 58(1), 183189.
 [21] Levitt JM, Thelwall M, Oppenheim C (2011). Variations between subjects in the extent to which the social sciences have become more interdisciplinary. J Assoc Inf Sci Technol 62(6), 11181129.
 [22] Porter AL, Roessner JD, Cohenm AS, Perreault M (2006). Interdisciplinary research: Meaning, metrics and nurture. Res Eval 15(3), 187195.
 [23] Porter AL, Rafols I (2009) Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics 81(3), 719745.
 [24] Rafols I, Meyer M (2010) Diversity and network coherence as indicators of interdisciplinarity: Case studies in bionanoscience. Scientometrics 82(2), 263287.
 [25] Abramo G, D’Angelo CA, Costa F (2012) Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications. J Assoc Inf Sci Technol 63(11): 22062222.
 [26] Chen S, Arsenault C, Gingras Y, Lariviére V (2015) Exploring the interdisciplinary evolution of a discipline: The case of Biochemistry and Molecular Biology. Scientometrics 102(2), 13071323.
 [27] Bordons M, Zulueta MA, Romero F, Barrigón S (1999) Measuring interdisciplinary collaboration within a university: The effects of the multidisciplinary research programme. Scientometrics 46(3), 383398.
 [28] Leydesdorff L, Goldstone RL (2014) Interdisciplinarity at the journal and specialty level: The changing knowledge bases of the journal Cognitive Science. J Assoc Inf Sci Technol 65(1), 164177.
 [29] Zhang L, Rousseau R, Glänzel W (2015) Diversity of references as an indicator for interdisciplinarity of journals: Taking similarity between subject fields into account. J Assoc Inf Sci Technol 67(5), 12571265.
 [30] Lungeanu A, Huang Y, Contractor NS (2014) Understanding the assembly of interdisciplinary teams and its impact on performance. J Informetr 8(1), 5970.
 [31] Lariviére V, Gingras, Y (2010) On the relationship between interdisciplinarity and scientific impact. J Assoc Inf Sci Technol 61(1), 126131.
 [32] Lariviére V, Haustein S, Börner K (2015) Longdistance interdisciplinarity leads to higher scientific impact. Plos One 10(3), e0122565.
 [33] Rinia EJ, van Leeuwen TN, van Raan AFJ (2002) Impact measures of interdisciplinary research in physics. Scientometrics 53(2), 241248.
 [34] Wan J, Thijs B, Glänzel W (2015) Interdisciplinarity and impact: Distinct effects of variety, balance, and disparity. Plos One 10(5), e0127298.
 [35] Levitt JM, Thelwall M (2009) The most highly cited library and information science articles: interdisciplinarity, first authors and citation patterns. Scientometrics 78(1), 4567.
 [36] Levitt JM, Thelwall M (2008) Is multidisciplinary research more highly cited?: A macrolevel study. J Assoc Inf Sci Technol 59(12), 19731984.
 [37] Chen S, Arsenault C, Lariviére V (2015) Are topcited papers more interdisciplinary? J Informetr 9(4): 10341046.
 [38] Stirling A (2007) A general framework for analyzing diversity in science, technology and society. J Roy Soc Interf 4(5), 707719.
 [39] Leydesdorff L (2007) Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. J Assoc Inf Sci Technol 58(9), 13031319.
 [40] Van den Besselaar P, Heimeriks G (2001, July). Disciplinary, multidisciplinary, interdisciplinary: Concepts and indicators. In ISSI (pp. 705716).
 [41] Kagan J. The three cultures: Natural sciences, social sciences, and the humanities in the 21st century. Cambridge University Press, 2009.
 [42] Xie Z, Duan XJ, Ouyang ZZ, Zhang PY (2015) Quantitative analysis of the interdisciplinarity of applied mathematics. Plos One 10(9): e0137424.
 [43] Milojević S. (2013). Accuracy of simple, initialsbased methods for author name disambiguation. J Informetr 7(4): 767773.
 [44] Kim J, Diesner J (2016) Distortive effects of initialbased name disambiguation on measurements of largescale coauthorship networks. J Assoc Inf Sci Technol 67(6):14461461.
 [45] Milojević S (2010) Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. J Assoc Inf Sci Technol 61(7): 14101423.
 [46] Xie Z, Ouyang ZZ, Li JP, Dong EM, Yi DY (2018) Modelling transition phenomena of scientific coauthorship networks. J Assoc Inf Sci Technol 69(2): 305317.
 [47] Consul PC, Jain GC (1973) A generalization of the Poisson distribution. Technometrics 15(4): 791799.
 [48] Xie Z, Xie ZL, Li M, Li JP, Yi DY (2017) Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 112, 483507.
 [49] Levitt JM, Thelwall M, Oppenheim C (2011) Variations between subjects in the extent to which the social sciences have become more interdisciplinary. J Assoc Inf Sci Technol 62(6), 11181129.
 [50] Hey T, Tansley S, Tolle KM (2009) The fourth paradigm: dataintensive scientific discovery, Microsoft research, Redmond, Washington.
 [51] Haythornthwaite C (2006). Learning and knowledge networks in interdisciplinary collaborations. J Assoc Inf Sci Technol 57(8), 10791092.
 [52] Grauwin S, Beslon G, Éric Fleury, Franceschelli S, Robardet C, Rouquier JB, Jensen P (2012) Complex systems science: dreams of universality, interdisciplinarity reality. J Assoc Inf Sci Technol 63(7), 13271338.
 [53] Brier S (2013) Cybersemiotics: a new foundation for transdisciplinary theory of information, cognition, meaningful communication and the interaction between nature and culture. Integr Rev 9: 222263.
5 Appendix
The following boundary detection algorithms come from Reference Xie6 .
Input: Observations (), rescaling function , and fitting model . 

For from to do: 
Fit to the PDF of by maximumlikelihood 
estimation; 
Do KS test for two data and , 
with the null hypothesis they coming from the same distribution; 
Break if the test rejects the null hypothesis at significance level . 
Output: The current as the boundary point. 
Input: Data vector (), rescaling funtion , and fitting model . 

For from to do: 
Fit to , by regression; 
Do KS test for two data vectors and , with the null 
hypothesis they coming from the same distribution; 
Break if the test rejects the null hypothesis at significance level . 
Output: The current as the boundary point. 