Exploratory Analysis of Academic Collaborations between French and US

01/04/2022
by   George Panagopoulos, et al.
Ecole Polytechnique
0

International academic collaborations cultivate diversity in the research landscape and facilitate multiperspective methods, as the scope of each country's science depends on its needs, history, wealth etc. Moreover the quality of science differ significantly amongst nations<cit.>, which renders international collaborations a potential source to understand the dynamics between countries and their advancements. Analyzing these collaborations can reveal sharing expertise between two countries in different fields, the most well-known institutions of a nation, the overall success of collaborative efforts compared to local ones etc. Such analysis were initially performed using statistical metrics <cit.>, but network analysis has later proven much more expressive <cit.>. In this exploratory analysis, we aim to examine the collaboration patterns between French and US institutions. Towards this, we capitalize on the Microsoft Academic Graph MAG <cit.>, the largest open bibliographic dataset that contains detailed information for authors, publications and institutions. We use the coordinates of the world map to tally affiliations to France or USA. In cases where the coordinates of an affiliation were absent, we used its Wikipedia url and named entity recognition to identify the country of its address in the Wikipedia page. We need to stress that institute names have been volatile (due to University federations created) in the last decade in France, so this is a best effort trial. The results indicate an intensive and increasing scientific production in with , with certain institutions such as Harvard, MIT and CNRS standing out.

READ FULL TEXT VIEW PDF

Authors

page 2

page 4

page 5

page 6

page 8

page 9

page 10

01/19/2020

The effect of national and international multiple affiliations on citation impact

Researchers affiliated with multiple institutions are increasingly seen ...
07/27/2021

Bibliometric Profile of Nursing Research in Ex Yugoslavian Countries

The development of modern nursing and consequently nursing research in E...
07/19/2020

Trends in Cuban research output: publications and patents

Cuban science and technology are known for important achievements, parti...
07/19/2018

Universalizing science: alternative indices to direct research

Measurement is a complicated but very necessary task. Many indices have ...
10/10/2016

Ranking academic institutions on potential paper acceptance in upcoming conferences

The crux of the problem in KDD Cup 2016 involves developing data mining ...
10/18/2020

The Leaky Pipeline in Physics Publishing

Women make up a shrinking portion of physics faculty in senior positions...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

International academic collaborations cultivate diversity in the research landscape and facilitate multiperspective methods, as the scope of each country’s science depends on its needs, history, wealth etc. Moreover the quality of science differ significantly amongst nations[5], which renders international collaborations a potential source to understand the dynamics between countries and their advancements. Analyzing these collaborations can reveal sharing expertise between two countries in different fields, the most well-known institutions of a nation, the overall success of collaborative efforts compared to local ones etc. Such analysis were initially performed using statistical metrics [7], but network analysis has later proven much more expressive [14, 4]. In this exploratory analysis, we aim to examine the collaboration patterns between French and US institutions. Towards this, we capitalize on the Microsoft Academic Graph MAG [13], the largest open bibliographic dataset that contains detailed information for authors, publications and institutions. We use the coordinates of the world map to tally affiliations to France or USA. In cases where the coordinates of an affiliation were absent, we used its Wikipedia url and named entity recognition to identify the country of its address in the Wikipedia page. We need to stress that institute names have been volatile (due to University federations created) in the last decade in France, so this is a best effort trial. The results indicate an intensive and increasing scientific production in with , with certain institutions such as Harvard, MIT and CNRS standing out.

2 Analysis

2.1 Coauthorships of the top French Institutes

We define a collaboration among a French and a US scientific institute if there is at least one paper coauthored by scientists from both these institutes. Among the French academic institutes that collaborate with USA we report in Table 1 the 10 most productive (in terms of number of papers) and their most frequent collaborators in the USA. Figure 1 visualizes the same information with a chord plot, where the edges are colored based on the US institutes. This shows the US universities collaborating mostly with the aforementioned most productive French institutes. Harvard and MIT stand out with more than 3 collaborations with one of the top French institutes, while the strongest collaboration takes place between CNRS and CalTEch.

Figure 1: Table 1 visualized in a chord plot
France USA
Centre national de la recherche scientifique (CNRS)
California Institute of Technology (CalTech) [2678]
Massachusetts Institute of Technology(MIT) [2226]
Harvard University [2106]
French Institute of Health and Medical Research (FIHM)
Harvard University [1765]
National Institutes of Health (NIH) [1484]
Boston Children’s Hospital (BCH) [1053]
University of Paris (UParis)
Harvard University [2061]
Stanford University [1486]
Massachusetts Institute of Technology (MIT) [1280]
École Normale Supérieure (ENS)
Harvard University [276]
Massachusetts Institute of Technology (MIT) [252]
University of California, Berkeley [247]
Institut National de la Recherche Agronomique (INRA)
University of California, Davis [255]
Agricultural Research Service (AGS) [255]
United States Department of Agriculture (USDA) [244]
Pierre-and-Marie-Curie University (UPMC)
Massachusetts Institute of Technology (MIT) [769]
Harvard University [761]
University of Michigan [670]
University of Bordeaux
Harvard University [335]
University of Washington [277]
National Institutes of Health (NIH) [212]
University of Paris-Sud
Massachusetts Institute of Technology (MIT) [1327]
Harvard University [1305]
Iowa State University [1093]
University of Montpellier
University of Washington [230]
Harvard University [219]
University of Maryland, College Park (UMD) [211]
Pasteur Institute
National Institutes of Health (NIH) [561]
Harvard University [421]
Centers for Disease Control and Prevention (CDCP) [313]
Table 1: Top 10 French institutions (in terms of numbers of papers), with their 3 closest collaborators in US. The number of collaborations in each case is included after the name of the US institute. The names of the institutes are abbreviated for visualization purposes on Fig. 1.

2.2 The timeline of French-US collaborations

The oldest collaboration between a French and a US institute dates back to 1930, with a joint paper between Ecole Normal Superiere and Cornell University entitled ”LA GRAISSE DU SANG ET LA GRAISSE DU LAIT PENDANT LA. LACTATION” [12] . Since then, the number of collaborations, as well as their impact (in terms of citations) have increased significantly, as one can see in Figures 2 and 3. There is an almost exponential increase in the number of papers produced by joint collaborations. The same applies for the impact of these works, which increases especially until 2010.Naturally, the citations count in recent years is diminished, as young papers have fewer citations and they need time to get cited. There are also some monumental years, such as 2012, where the inclusion of the publications on the discovery of the Higgs Boson particle e.g. [1, 2] has produced a massive number of citations.

Figure 2: FR-US joint papers temporal distribution.
Figure 3: FR-US joint papers’ citations temporal distribution .

Distinguishing between different fields, we can see in Fig. 4 how the number of collaborations have evolved through the years in different disciplines, as well as their success in Fig. 5. Bare in mind that a paper might belong to more than one fields.

  • We see that the majority of works are comprised of medical studies, which is quite common in academia.

  • The second field is computer science and the third biology. Especially in computer science, there is a steep increase around 2010.

  • The citations do not follow the same pattern, as the top cited domain is medicine, followed by biology and and computer science. This is due to the known differences in citation patterns among computer science and biology [11].

  • Mathematics is the least active field in this context since publications in this area are relatively rare. Still there are some spikes of citations through the years because of some important papers.

Figure 4: FR-US joint papers temporal distribution per prominent scientific area.
Figure 5: FR-US joint papers citations temporal distribution per prominent scientific area.

2.2.1 Top Collaborations

Let the strength of the collaboration be denoted as

, refering to the number of such papers. We rank all collaborations based on strength and define the most prominant the ones who are above the average number of papers in all collaborations and six standard deviations of the distribution i.e.

, where is the set of all collaborations. The threshold and the density are visualized in figure 6. The number of collaborations that are above this threshold is 492.

Figure 6: Density of the number of collaborations with the number of papers, and the threshold to define the top collaborations.

These are visualized in the map plot 7. Although we can get an idea of the overall cities that collaborate mostly with each other, it is still a very perplex image to make sense of. Thus we reduce it even more, by taking the top 100 collaborations, and making a weighted bipartite plot in 8.

Figure 7: USA-France top collaborations.
Figure 8: USA-France top 100 collaborations in terms of number of joint papers. Left is USA, right is France. The size of the nodes is proportional to their number of papers, the edge width is proportional to the collaborations’ strength and their color is different for each French institution.

Few observations derived from a first glance on this network:

  • CNRS is the French institute with the most collaborations, followed by Ecole Polytechnique and University of Strasbourg.

  • Some strong collaborations that stand out are CNRS with CalTech and UMD and University of Paris with Harvard.

  • The French Institute of Health and Medical Research has few collaborations, but with two US institutes well known for their achievements in medicine, Harvard and NIH.

  • Institutes performing research on fields like physics, biology or chemistry tend to have more connections than the ones focusing on language studies or the ones performing solely medical research. This might indicate that STEM projects applied or related to medicine exhibit international collaborations, while purely medical studies have a local collaboration pattern.

  • The most productive French Institute is clearly CNRS, followed by University of Paris, while for the USA it is Harvard and University of Michigan.

2.2.2 Top 10 collaborations

The top 10 USA-France collaborations in terms of absolute number of joint papers can be seen in table 2

USA France
California Institute of Technology Centre national de la recherche scientifique (CNRS) 2678
Massachusetts Institute of Technology Centre national de la recherche scientifique (CNRS) 2226
Harvard University Centre national de la recherche scientifique (CNRS) 2106
University of Maryland, College Park Centre national de la recherche scientifique (CNRS) 2092
Harvard University University of Paris 2061
Ohio State University Centre national de la recherche scientifique (CNRS) 1840
Harvard University French Institute of Health and Medical Research 1765
University of Wisconsin-Madison Centre national de la recherche scientifique (CNRS) 1742
Princeton University Centre national de la recherche scientifique (CNRS) 1638
University of Michigan Centre national de la recherche scientifique (CNRS) 1596
Table 2: Top USA-France collaborations in terms of number of papers.

For each of the 10 collaborations we report the topical distribution in the figures below. Some observations:

  • The collaboration between CalTech and CNRS focuses mainly in math,chemistry and physics, since both are widely known for their excellence in these fields.

  • The joint works of CNRS with MIT and Harvard are predominantly related to medicine. In contrast, the secondary fields in MIT are computer science, chemistry, physics and math, while for Harvard biology is second and the rest follow. This makes sense because, as mentioned above, MIT is focusing more on science and technology while Harvard has a broader scope.

  • University of Maryland (UMD) and CNRS seem to collaborate in a variety of disciplines, with physics being the dominant.

  • It is clear the main joint papers between Harvard and University of Paris refer to medical studies. The same applied to the French Institute of Health and Medical Research, which overall might refer to joint works between these three institutes.

  • CNRS has also collaborated extensively with Ohio State, Wisconsin-Madison and Michigan, especially in medicine.

3 Future Work

These initial results indicate there is an intensive and increasing scientific production in terms of joint papers and resulting citations. Overall, several new hypotheses can be tested:

  • The use of influence and success metrics to identify the most crucial authors [3, 10, 8], cliques or laboratories that guide the course of the majority of the collaborations in a direct or indirect manner.

  • Prediction of new collaborations based on the fields the institutes belong to or the venues they publish at.

  • Prediction of institutes that will increase their long-term impact based on their current position and activity in the network [9].

  • Use of complementary datasets such as OpenAIRE [6] to include information about funded projects.

4 Acknowledgements

This analysis was performed after the request of Dr. Yves Frenot, Dr. Jean-Baptiste Bordes and Maxime Benallaoua from the Office for Science and Technology of the Embassy of France in the United-States, with whom we collaborated to set the hypotheses examined.

References

  • [1] G. Aad, T. Abajyan, B. Abbott, J. Abdallah, S. A. Khalek, A. A. Abdelalim, O. Abdinov, R. Aben, B. Abi, M. Abolins, et al. (2012) Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc. Physics Letters B 716 (1), pp. 1–29. Cited by: §2.2.
  • [2] F. Bezrukov, M. Y. Kalmykov, B. A. Kniehl, and M. Shaposhnikov (2012) Higgs boson mass and new physics. Journal of High Energy Physics 2012 (10), pp. 140. Cited by: §2.2.
  • [3] C. Giatsidis, G. Nikolentzos, C. Zhang, J. Tang, and M. Vazirgiannis (2019) Rooted citation graphs density metrics for research papers influence evaluation. Journal of Informetrics 13 (2), pp. 757–768. Cited by: 1st item.
  • [4] G. González-Alcaide, R. Aleixandre-Benavent, C. Navarro-Molina, and J. C. Valderrama-Zurián (2008) Coauthorship networks and institutional collaboration patterns in reproductive biology. Fertility and sterility 90 (4), pp. 941–956. Cited by: §1.
  • [5] D. A. King (2004) The scientific impact of nations. Nature 430 (6997), pp. 311. Cited by: §1.
  • [6] P. Manghi, N. Manola, W. Horstmann, and D. Peters (2010) An infrastructure for managing ec funded research output-the openaire project. The Grey Journal (TGJ): An International Journal on Grey Literature 6 (1). Cited by: 4th item.
  • [7] G. Melin and O. Persson (1996) Studying research collaboration using co-authorships. Scientometrics 36 (3), pp. 363–377. Cited by: §1.
  • [8] G. Panagopoulos, F. Malliaros, and M. Vazirgiannis (2020)

    Multi-task learning for influence estimation and maximization

    .
    IEEE Transactions on Knowledge and Data Engineering. Cited by: 1st item.
  • [9] G. Panagopoulos, G. Tsatsaronis, and I. Varlamis (2017) Detecting rising stars in dynamic collaborative networks. Journal of Informetrics 11 (1), pp. 198–222. Cited by: 3rd item.
  • [10] G. Panagopoulos, C. Xypolopoulos, K. Skianis, C. Giatsidis, J. Tang, and M. Vazirgiannis (2019) Scientometrics for success and influence in the microsoft academic graph. In International Conference on Complex Networks and Their Applications, pp. 1007–1017. Cited by: 1st item.
  • [11] G. S. Patience, C. A. Patience, B. Blais, and F. Bertrand (2017) Citation analysis of scientific categories. Heliyon 3 (5), pp. e00300. Cited by: 3rd item.
  • [12] C. Porcher and L. Maynard (1930) La graisse du sang et la graisse du lait pendant la lactation. Le Lait 10 (96), pp. 601–613. Cited by: §2.2.
  • [13] A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B. P. Hsu, and K. Wang (2015) An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pp. 243–246. Cited by: §1.
  • [14] C. S. Wagner and L. Leydesdorff (2005) Mapping the network of global science: comparing international co-authorships from 1990 to 2000. International journal of Technology and Globalisation 1 (2), pp. 185–208. Cited by: §1.