China may need to support more small teams in scientific research

02/29/2020 ∙ by Linlin Liu, et al. ∙ Southwest University 0

Modern science is dominated by scientific productions from teams. Large teams have demonstrated a clear advantage over small teams in applying for research funding, performing complicated research tasks and producing research works with high impact. Recent research, however, shows that both large and small teams have their own merits. Small teams tend to expand the frontier of knowledge by creating disruptive research outcomes, while large teams are more apt to work in the established field and develop existing problems. Given different roles of big and small teams in research, the extent to which a country's scientific work is carried out by big/small teams is of great importance. Here, we analyze over 26 million papers from Web of Science published from 2000 to 2017. We find that China's research output is more dominated by big teams than the rest of the world. It is indeed a global trend that more papers are done by big teams. However, the drop in small team output is much steeper in China. More importantly, as research teams in China shift from small to large size, the team diversity that is essential for innovative works does not increase as much as that in other countries. Papers by big teams tend to receive more citations, but this alone is insufficient to explain the dominance of big teams in China, because the citation boost is roughly the same in every country. However, using the national average as the baseline, we find that the National Natural Science Foundation of China (NSFC) supports fewer small team works than the National Science Foundation of U.S. (NSF) does, implying that big teams are more preferred by grant agencies in China. Our finding provides new insights into the concern of originality and innovation in China, which urges a need to balance small and big teams.



There are no comments yet.


page 2

page 3

page 4

page 8

page 10

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern science has witnessed the increasing dominance of teams. The single‐author papers, though not yet distinct as what Price predicted in 1963 (de1963little), have undergone a sharp drop, taking only a small portion of all publications (wuchty2007increasing; lariviere2015team; barlow2018extinction). Teams become the driving force of science because not only the problem to tackle is more complex, but also the knowledge is broader that inevitably makes scientists more specialized (jones2009burden; leahey2016sole). The improvement of communication technology, the convenience of transportation and the globalization also facilitate scientific collaborations. All of these make teams not only flourishing but also grow by size (newman2001scientific; gazni2012mapping; lariviere2015team; wu2019large). The average number of authors per publication increases every year and large teams involving more than 1000 members become common. In a recent paper studying the mass of the Higgs boson, the team size reaches a record high of over 5,000 scientists(castelvecchi2015physics).

The large team has clear advantages over the small team in solving complicated problems, securing research grants, receiving more citations on average, and publishing hit papers that are on the top of citation rank (thelwall2019large; wuchty2007increasing; cummings2007coordination). Recent research shows, however, that the bigger is not always the better (wu2019large). Instead, small and large teams take distinct yet equally essential roles in science. Large teams tend to work in the established field and exploit existing problems. In contrast, small teams are better at exploring the frontier of science, generating new ideas, and opening up new problems that can disrupt science. To better promote science, a balance between small and large teams is needed (azoulay2019small), giving rise to an interesting question: to what extent the research works of a nation is carried out by big/small teams.

The answer to this question may have important implications to scientific performance of a nation, if we accept the fact that patterns observed in small and large teams are universal. While it is hard to argue whether a balance or an optimal is reached in a nation, it is still meaningful to compare the small/large team composition in different countries. This is of particular importance to China in the context of its long term goal to be a global innovator (phillips2016china; zhou2016china). Indeed, while China has grown to be the world’s top scientific paper producer and citation receiver, it often worried that China’s scientific works are weak in originality and innovation (xie2014china; huang2018quality; guo2019contributions). In this paper, we perform quantitative analyses on publication data of over 26 million papers that are published from 2000 to 2017. We find that China is indeed different from other countries. The percentage of China’s scientific annual output from small teams is now the lowest in the world, after a sharp drop since 2000. As research teams in China shift from small to large size, the team diversity that is essential for innovative works does not increase as much as that in other countries. Most works by big teams are still carried out in one or two institutes. The dominance of big teams in China may not be explained by the citation boost from the team size. While the citation on average increases with team size, the rate of increase is roughly the same in every country. However, the preference of funding agencies may be related to the lack of small teams in China. In all, if small teams are more apt to perform disruptive research, the science community in China should be alerted, given the different statistics China demonstrates on small teams.

2 Data and Method

Data set. We use the data of the Web of Science (WOS), covering the Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI) and Arts & Humanities Citation Index (A & HCI) databases. In total, we analyze over 26 million papers published from year 2000 to 2017.

The country a paper belongs to. We use the straight counting by the first affiliation in our analyses (waltman2015field; zheng2014influences; huang2011counting). The country of a paper’s first affiliation determines the country this paper belongs to. We understand that other methods such as whole counting and fractional counting (sivertsen2019measuring; lewison2010understanding; lin2013influences; kao2009authorship; larsen2008state) are also widely applied to count publication numbers. But these methods may bring the issue of multiple counting, which can be problems in our analyses. Previous works suggest that straight counting might be better when studying the scientific output at the country level (huang2011counting). Another strategy of straight counting is to use the corresponding affiliation or the so-called “reprint address” in WOS database (kahn2016return; mazloumian2013global; gonzalez2017dominance). We find that for more than 95% of papers, the reprint address and the first affiliation point to the same country. For simplicity and the ease of future reproduction of our analyses, we choose to use the first affiliation, as the information of corresponding affiliation may not be directly available in other databases. Finally, to eliminate possible bias caused by straight counting in dealing with papers by international collaborations, we also separately analyzed the publications by authors from the same country. We find that our conclusion is not affected (Supplementary Note 1).

Countries considered. We include 15 countries in our analyses, which are roughly the top 15 countries of total scientific publications. They are United States (US), China (CN), United Kingdom (GB), Germany (DE), Japan (JP), Italy (IT), France (FR), Canada (CA), India (IN), Korea (KR), Spain (ES), Australia (AU), Brazil (BR), Netherlands (NL) and Turkey (TR). Following the typical practices, we use the scientific production from the mainland of China and Hong Kong, China. Given China’s huge annual production of scientific papers, it is less meaningful to compare it with countries of less scientific output. We also remove China when taking a global average in order to compare the results from China and the rest of the world.

Big team and small team. The term big team and small team are relatively new and there is no defined hard cutoff between them. Previous work (wu2019large) considers team size () of no more than 3 or 4 members as small. In this work, we analyze all situations (, , ) and find that our conclusion in general is not affected by choice of the parameter. The only inconsistency is that of China is slightly higher than that of Japan and Italy, making China not the lowest, but the third lowest among the 15 countries. But the value is still way below the global average. We present results based on in the main text of the paper. The corresponding results for and are presented in the Supplementary Information.

Research field of a paper. WOS has approximately 250 subject areas characterizing different research directions. Each paper is assigned one or multiple subject areas. The large number of subject areas makes it impossible to draw any conclusions in different research directions. Therefore, we use the classification in wu2019large that merges WOS subject areas into 14 research fields, including Physical sciences, Chemistry, Biology, Medicine, Agriculture, Environmental and earth sciences, Mathematics, Computer and information technology, Engineering, Social sciences, Business and management, Law, Humanities, and Multidisciplinary Sciences. This categorization is slightly different from what is recently proposed by (milojevic2020practical)

. However, because the findings of the work are mainly in the field of natural science, the difference is not significant. Since a paper is usually tagged by multiple subject areas, it may also be labeled by multiple research fields. It is difficult to tell the priority in multiple subject areas, nor could we artificially tell which research field is most close to the content of the paper, we use the whole counting in classifying papers into research fields. In general, depending on the publication year, 20% - 25% of papers are labeled by multiple research fields.

Institute diversity. WOS records the affiliations in each paper. Starting in 2008, it also records the affiliation of each author, i.e., who is affiliated with what affiliations. Therefore, there are two ways to analyze the institute diversity. One is to use the affiliations of a paper directly, the other is to use the “main” affiliation of each author. Both approaches have pros and cons. The information of a paper’s affiliation is easier to extract and is available for all papers in the data set. But giving the trend that more authors are affiliated with multiple institutions (hottenrott2019rise), directly using such information may overestimate the institute diversity. In some cases we may also have the number of institutes greater than the number of authors. Use an author’s “main” affiliation seems to be a more reasonable choice, which is also directly applied in the data set of Microsoft Academic Graph (wang2020microsoft; dong2018collaboration), but determining the primary affiliation out of others might be non-trivial. In this work, we use both methods to analyze institute diversity. If an author has multiple affiliations, we choose the one with the highest rank in the paper. We extract the institute information from the organization of the affiliation, which usually refers to university and research lab. We report the results based on the author’s affiliation in the main text. The results based on the paper’s affiliation can be found in the Supplementary Information. The two approaches give results slightly different in numbers, but the conclusion drawn are the same. We are also aware about the name disambiguation issue in institute names (donner2019comparing). This should not affect our analyses because we compare institutes in the same paper. It is very unlikely that authors would write one institute in different ways in one single paper.

Funding information. WOS starts in 2008 to record the text related to funding acknowledgment of each paper, from where funding information including the grant agency and grant ID is parsed. Despite concerns on the completeness and accuracy of the data (alvarez2017funding; paul2016characterization; tang2017funding), it remains to be one of the largest available. We use such information directly as the criteria if a paper is funded. Because our measure is controlled by the national average value, we believe flaws in the data recording should not affect the conclusion.

It is more complicated to search which paper is supported by the National Natural Science Foundation of China (NSFC) or the National Science Foundation (NSF) of the United States, because scientists acknowledge these funding agencies in different ways. For NSFC, the most frequently used name is “National Natural Science Foundation of China”, but other forms of name such as “Natural Science Foundation of China”, “NSFC”, “National Science Foundation of China”, “National Nature Science Foundation of China” and “National Natural Science Foundation” are also widely used. The name variations of NSF include “National Science Foundation”, “NSF” and “National Science Foundation (NSF)”. WOS has performed its own grant name disambiguation (which is available online), but such information is not available in our data set. Therefore, we extract the name of grant agency in each paper from China and the United States, filter out these appearing fewer than 1000 times in the data, and manually identify names associated with NSFC and NSF. These names are list in Table S1 of the Supplementary Information. Other statistics given by our approach are listed in Table S2, which is in line with previous findings (huang2016does; wang2012science).

It is noteworthy that the Ministry of Science and Technology (MOST) of China has its own research grants such as National Basic Research Program of China (973 Program), National High Technology Research and Development Program of China (863 Program), and National Key Technology R&D Program of China. The aim of these grants is to support big research groups (Fig. S10). While they cover a relatively small fraction of scientific papers, its overlap with the NSFC is large. Around 17.5% of NSFC supported papers are also supported by MOST, or equivalently 73.3% of MOST supported papers are simultaneously supported by NSFC. To avoid potential bias, we remove papers that are both supported by NSFC and MOST and focus on those “mainly” supported by NSFC. Note that the National Institutes of Health of the United States (NIH) also tends to support big groups (Fig. S10). To make the comparison equal, we also consider papers “mainly” supported by NSF by ignoring roughly 10.5% of NSF supported papers that are also supported by NIH. More statistics can be found in Table S3 of the Supplementary Information.

3 Results

Figure 1: (a) The fraction of papers in 2017 produced by teams with size no more than 4 () in different countries. The dashed line corresponds to the global average (in which China is excluded) (b) in different fields in the year 2017.

While collaboration plays an increasingly important role in scientific research, the big team has not yet taken over. In 2017, more than half of scientific papers are produced by teams with relatively small sizes (number of authors ). The fraction of small team output differs from nation to nation, but China ranks the last among the top 15 countries of scientific papers (Fig. 1a and S1). In 2017, only 37% of papers from China are done by teams with , while this value is 58% for United States and 55% for the global average (in which China is excluded).

We further analyze the in different research fields (Fig. 1b, S1 and S2). The statistics at the global level agree well with previous works and also with our intuitions. For example, small teams are more frequently observed in mathematics, computer science, social science, business, humanities and laws, with goes beyond 80% or even higher. In interdisciplinary fields where collective intelligence is more important, drops to the lowest. Fields such as medicine, biology, chemistry, physics, agriculture are usually believed to be labor intensive, requiring more individuals to be involved in. But on the global average, is not very far below 50% and in some fields can even go above. Nevertheless, of China is significantly less than the global average in all areas of natural science. The relative difference is most prominent in agriculture, chemistry, biology and medicine. On the contrary, of the United States is greater than the global average in almost all areas of natural science. Being an Asian country the same as China, Japan has the much greater than that of China in all areas of natural science except medicine. In fields related to humanity, social Science, mathematics, of China is not very different from other countries (li2015patterns), but papers in such fields take only a very small fraction of China’s annual production.

Figure 2: (a) The time evolution of in different countries. The dashed line corresponds to the global average (in which China is excluded). While it is a global trend that more papers are done by big teams, China’s drop is much steeper. (b) The drop of from year 2000 to 2017 in different fields. The dashed line corresponds to the drop of the global value in (a). China’s drop is most prominent in fields of natural science and engineering, sometimes can be twice as much as the global value. Note that the small team output actually increases in Japan in fields of Social Science and Law, giving rise to a negative value of . Since we focus on the drop, we do not put them in the figure.

It is noteworthy that more papers are carried out by big teams is a global trend. Indeed, we find in our analyses that the percentage of papers by small teams decreases over years. Nevertheless, the drop of China is much steeper (Fig. 2a and S3). In 2000, of China is, though slightly smaller, not very different from that of the United States and global average. But it goes down from 69.9% in 2000 to 37.4% in 2017, nearly 32 percentage points decrease. The drop, however, is only 17 percentage points for the United States (from 75.4% to 58.2%), 12 percentage points for Japan (from 53.0% to 41.0%), and 16 percentage points for the global average (from 70.8% to 55.2%). China’s drop of small team output in fields of natural science and engineering is much higher than those of global average (Fig. 2b and S3), in line with our initial finding that small team output is small in these fields.

Figure 3: (a) The distribution of the number of distinct institutes in papers published in 2017 with team size 4, 5 and 6 (from left to right). China’s team composition is not very different from other countries when the team is small. But as the team size increases, the distribution becomes more dominated by output from one institute. (b) The fraction of papers in 2017 done by one institute, given the team size . More percentage of paper output is from a single institute in China than in other countries. (c) Similar to (b). The fraction of papers in 2017 involving no more than two institutes.

The observation that big teams dominate China’s research output gives rise to another question: how would the team composition change when it shifts from small to large size. Indeed, a team can increase its size by adding more similar members or involving members with different backgrounds. While the team size grows in either way, team diversity is different, which is proved to be an important factor in building a successful team (alshebli2018preeminence; powell2018these). There are different types of team diversity, such as ethnicity, discipline, gender, affiliation, and academic age (alshebli2018preeminence; huang2019historical; jia2017quantifying). Here we focus on the affiliation and analyze the diversity at the institute (organization) level. Indeed, a smaller team whose members are from diverse institutes is more likely to generate “hit” papers than a relatively larger team within one institution (dong2018collaboration; jones2008multi). Here, we find that China’s team composition is close to that of other countries when the team size is small, demonstrating a similar extent of institute diversity (Fig. 3a and S4). However, different from other countries, China’s institute diversity increases much slower as the team size increases. A significant fraction of big teams remain to be formed by members from the same institute (Fig. 3b, c and S4). For example, for all China’s papers by 6 authors in 2017, nearly 50% of them are done in the same institute, which is 14 percentage points higher than that of the United States. A similar conclusion also holds when we use the fraction of papers done by no more than 2 institutes. It is encouraging to notice the trend that teams tend to be more diverse as time goes. The one-institute papers take a fewer fraction of total publications now than in the past (Fig. S5). However, the rate of change is low, suggesting that the institute diversity for China, an important factor for innovative works, will not improve very much in the near future.

Figure 4: (a) The total number of citations a paper receives within 5 years of its publication is positively correlated with the team size in every country. The statistics are based on papers published in 2011. (b) When re-scaling the number of citation of by the average value of a country as , different curves in (a) almost collapse to a single curve showing similar increasing trend with team size. (c) The relationship between the fraction of papers acknowledge funding support and the team size . The percentage of funded papers are highest in China. (d) Curves in (c) is re-scaled by the average value of a country. Large teams in China do not have a higher than global average trend to have works supported by research grants. (e) The fraction of small team output () among all papers, funded papers, and papers supported by NSF from United States. While funding agencies in general prefer big teams, NSF supports more small team works than the average. (f) The fraction of small team output () among all papers, funded papers, and papers supported by NSFC from China. Only in year 2010 and 2011, NSFC supports slightly more small team works. In most of the time, NSFC supports fewer small team works than the average.

So far, we have demonstrated the aspects that China differs from other counties in research teams. What remains unclear are the factors that give rise to the differences observed. Given confounding factors in team assembling in different countries, identifying these factors is out of this paper’s scope. Nevertheless, we perform some preliminary analyses by proposing and testing two hypotheses that seem capable of explaining the observations.
H1: Papers by big teams have a higher capability in receiving more citations in China than other countries.
H2: Big teams are more preferred by funding agencies in China than in other countries.

Note that papers by larger teams on average receive more citations than those by smaller teams (klug2016understanding; wu2019large; wuchty2007increasing). The argument for H1 is that the citation boost is more considerable in China, which consequently provides incentives to build large teams. We count the total number of citations a paper receives within 5 years of its publication , and find that overall is positively correlated with the team size (Fig. 4a). Papers from different countries receive different levels of citations, however, after re-scaling these curves by the national average of , they almost collapse to a single curve (Fig. 4b and Fig. S6). The trend that papers by bigger teams receive higher citations is not different, at least not more extreme, in China than in other countries. The same conclusion also holds when we use a shorter time window to count citations (Fig. S7). Hence we conclude that H1 is not valid.

We test H2 by extracting the grant information of each paper. Over 80% of papers from China contains the grant information, much higher than other countries (Fig. 4c). It implies that Chinese scientists are more obligated to acknowledge the funding agencies, or simply that only teams capable of securing research grants can efficiently conduct scientific research (wang2019early; yang2015matthew; wang2012science). Either of these explanations sufficiently demonstrate the significant impact of funding agencies on scientific research in China. As intuitively expected, the percentage of papers with grants increases with team size in almost every country (Fig. 4c). But once again, the increase is not sharper in China than in other countries (Fig. 4d). Nor could we find any difference in the number of grants a paper is supported by (Fig. S8).

Indeed, given different sources of funding in different nations, different policies and aims of different funding agencies, and potential flaws in the records that may affect the observation (alvarez2017funding; paul2016characterization; tang2017funding; azoulay2011incentives), it would be less meaningful to test H2 by comparing all grants and papers from all countries. For this reason, we then consider only two grants: the National Natural Science Foundation of China (NSFC) and the National Science Foundation of the United States (NSF). It is believed that China learned from NSF to initiate and organize NSFC. They two have a very similar amount of budget (especially after taking purchasing power into consideration), scope and aim. In addition, both of them are one of the major national funding resources for fundamental research (huang2016does; wang2012science)

. All these features make NSFC and NSF two comparable examples. For each of China and the United States, we collect 3 sets of papers: all papers published in a given year, papers supported by grants in that year, and papers mainly supported by NSFC or NSF in that year (see Data and Method for details). Compared with the national average, the small team output is less in papers with grants (Fig. 4e, f, and S9), in line with our previous finding that works by larger teams have a higher probability of being sponsored. However, within papers supported by NSF, the fraction from small teams is higher than average (Fig. 4e and S9). On the contrary, the fraction of small team output is usually less than average in papers supported by NSFC (Fig. 4f and S9). In other words, using the national average as the baseline, NSF supports more small team works than NSFC does. Given the similarities between the two funding agencies, this observation supports H2.

It may be argued that NSFC and NSF are not comparable because China does not have an independent funding agency like the National Institutes of Health (NIH) that mainly focuses on biomedical research. Therefore, NSFC supports more works in medicine that rely mainly on corporations by big teams. Consequently, the percentage of small team output is dragged down. Statistically, the argument stands. The fraction of supported works in biology is roughly the same for NSFC and NSF, where 12% of NSFC supported works and 13.5% of NSF supported works are in biology. But there is a non-negligible difference in the field of medicine: 9.6% of NSFC supported works are in medicine while this value is only 2.4% for NSF. Such difference by itself is related to intriguing questions in research management and policy, as it is unclear if combining application-oriented research like medicine with the basic research, like what NSFC does, would enhance the efficiency. Nevertheless, in terms of data analyses, we can do a treatment in the data by excluding NSFC and NSF supported papers in the field of medicine. After this modification, papers supported by NSFC that are carried out by small teams are slightly more than the national average (Fig. S11). The extent that NSFC is over the national average, however, is still smaller than that of NSF does. Hence, even after excluding papers in medicine, NSF supports more small team works than NSFC does, supporting our conclusion above.

4 Conclusion

To summarize, we analyze over 26 million papers on Web of Science published from 2000 to 2017, which is one of the most extensive analyses in terms of papers covered. We find that China’s research output is more dominated by big teams than the rest of the world. The fraction of papers by small teams in China is not only much lower than the global average, ranking the last among the top 15 countries of scientific publications in 2017, but also has undergone a much stepper decrease since 2000. More importantly, as teams in China shift from small to large size, the team diversity that is essential for innovative works does not follow the same increase as that in other countries. A high fraction of works are carried out within one or two institutes. All of these observations indicate that China is very different from other countries in the composition of big and small teams in scientific research. If referring to the global average or country like the United States, China is way apart from the balance point.

Given the importance of the problem, we also make some preliminary attempts to understand what factors can explain the different small/big team composition in China. The first hypothesis we test is that China’s big teams have a more considerable advantage to gain citations than that of other countries. Hence there are more incentives to build a large team. Indeed, works by larger teams on average receive more citations than those by small teams. However, the citation boost is roughly the same in every country after taking the national average citation into consideration, implying that citation alone can not explain the difference. We then turn to check if large teams are more preferred by funding agencies. More than 80% of papers from China acknowledge research grants, which is the highest among the 15 countries analyzed. It clearly indicates the significant influence of funding agencies on China’s scientific research. While works by large teams are more apt to be supported by grants, China does not demonstrate any different patterns on this matter, following the same trend as other countries. Yet, when we separately compare the works supported by NSFC and NSF, we find that NSFC supports fewer small team works than NSF does. This gives some clues supporting the hypothesis that preferences by the funding agencies may be associated with the imbalance of small and big teams in China.

The concern on balance between small and big teams is relatively a new topic, which was rarely studied in the past. Nevertheless, if we admit that small and large teams play different yet equally essentials roles in scientific research, we need to consider this issue seriously. Our analyses based on the large volume of publication data bring strong evidence that China needs more small team output. The factor we spot that is associated with this imbalance further shied light on this issue. Giving multiple confounding factors that may influence the organization of teams, we admit that our finding is preliminary, and other factors can play a role. For example, some observations in this paper can be explained by the fact that big teams in China are more productive than those in other countries. However, given the fluidness in team assembly (wang2015scientific; milojevic2014principles; abramo2017relationship), testing this hypothesis is challenging, which requires better author name disambiguation algorithm and other techniques to extract the core of the team in the collaboration network (wang2020measuring; yu2019academic). The lack of team diversity and bigger team size in China can also be associated with the honorary authorship (biagioli2018academic), although there is no evidence that such misconduct is more severe in China (tang2019five). The collectivist culture in Asia may also encourage the formation of big teams. Both Japan and Korea have a relatively small percentage of small team output. Exploring these factors may not only provide useful insights to the research community in China, but also advance our quantitative understanding of science (fortunato2018science; azoulay2018toward).


5 Supportive Information

See the file of supplementary information for additional statistics and figures.

We thank Prof. Barabasi at CCNR for giving access to the WOS data. T.J., J.H., and F.X. designed the research, J.H. and T.J. did the data parsing and cleaning, T.J., L.L. and F.Y analyzed the data, collected the statistics and reviewed related literature. T.J. and L.L. prepared the initial draft of the manuscript. All authors contributed comments on the results and revisions to the final version.