International academic collaborations cultivate diversity in the research landscape and facilitate multiperspective methods, as the scope of each country’s science depends on its needs, history, wealth etc. Moreover the quality of science differ significantly amongst nations, which renders international collaborations a potential source to understand the dynamics between countries and their advancements. Analyzing these collaborations can reveal sharing expertise between two countries in different fields, the most well-known institutions of a nation, the overall success of collaborative efforts compared to local ones etc. Such analysis were initially performed using statistical metrics , but network analysis has later proven much more expressive [14, 4]. In this exploratory analysis, we aim to examine the collaboration patterns between French and US institutions. Towards this, we capitalize on the Microsoft Academic Graph MAG , the largest open bibliographic dataset that contains detailed information for authors, publications and institutions. We use the coordinates of the world map to tally affiliations to France or USA. In cases where the coordinates of an affiliation were absent, we used its Wikipedia url and named entity recognition to identify the country of its address in the Wikipedia page. We need to stress that institute names have been volatile (due to University federations created) in the last decade in France, so this is a best effort trial. The results indicate an intensive and increasing scientific production in with , with certain institutions such as Harvard, MIT and CNRS standing out.
2.1 Coauthorships of the top French Institutes
We define a collaboration among a French and a US scientific institute if there is at least one paper coauthored by scientists from both these institutes. Among the French academic institutes that collaborate with USA we report in Table 1 the 10 most productive (in terms of number of papers) and their most frequent collaborators in the USA. Figure 1 visualizes the same information with a chord plot, where the edges are colored based on the US institutes. This shows the US universities collaborating mostly with the aforementioned most productive French institutes. Harvard and MIT stand out with more than 3 collaborations with one of the top French institutes, while the strongest collaboration takes place between CNRS and CalTEch.
|Centre national de la recherche scientifique (CNRS)||
|French Institute of Health and Medical Research (FIHM)||
|University of Paris (UParis)||
|École Normale Supérieure (ENS)||
|Institut National de la Recherche Agronomique (INRA)||
|Pierre-and-Marie-Curie University (UPMC)||
|University of Bordeaux||
|University of Paris-Sud||
|University of Montpellier||
2.2 The timeline of French-US collaborations
The oldest collaboration between a French and a US institute dates back to 1930, with a joint paper between Ecole Normal Superiere and Cornell University entitled ”LA GRAISSE DU SANG ET LA GRAISSE DU LAIT PENDANT LA. LACTATION”  . Since then, the number of collaborations, as well as their impact (in terms of citations) have increased significantly, as one can see in Figures 2 and 3. There is an almost exponential increase in the number of papers produced by joint collaborations. The same applies for the impact of these works, which increases especially until 2010.Naturally, the citations count in recent years is diminished, as young papers have fewer citations and they need time to get cited. There are also some monumental years, such as 2012, where the inclusion of the publications on the discovery of the Higgs Boson particle e.g. [1, 2] has produced a massive number of citations.
Distinguishing between different fields, we can see in Fig. 4 how the number of collaborations have evolved through the years in different disciplines, as well as their success in Fig. 5. Bare in mind that a paper might belong to more than one fields.
We see that the majority of works are comprised of medical studies, which is quite common in academia.
The second field is computer science and the third biology. Especially in computer science, there is a steep increase around 2010.
The citations do not follow the same pattern, as the top cited domain is medicine, followed by biology and and computer science. This is due to the known differences in citation patterns among computer science and biology .
Mathematics is the least active field in this context since publications in this area are relatively rare. Still there are some spikes of citations through the years because of some important papers.
2.2.1 Top Collaborations
Let the strength of the collaboration be denoted as
, refering to the number of such papers. We rank all collaborations based on strength and define the most prominant the ones who are above the average number of papers in all collaborations and six standard deviations of the distribution i.e., where is the set of all collaborations. The threshold and the density are visualized in figure 6. The number of collaborations that are above this threshold is 492.
These are visualized in the map plot 7. Although we can get an idea of the overall cities that collaborate mostly with each other, it is still a very perplex image to make sense of. Thus we reduce it even more, by taking the top 100 collaborations, and making a weighted bipartite plot in 8.
Few observations derived from a first glance on this network:
CNRS is the French institute with the most collaborations, followed by Ecole Polytechnique and University of Strasbourg.
Some strong collaborations that stand out are CNRS with CalTech and UMD and University of Paris with Harvard.
The French Institute of Health and Medical Research has few collaborations, but with two US institutes well known for their achievements in medicine, Harvard and NIH.
Institutes performing research on fields like physics, biology or chemistry tend to have more connections than the ones focusing on language studies or the ones performing solely medical research. This might indicate that STEM projects applied or related to medicine exhibit international collaborations, while purely medical studies have a local collaboration pattern.
The most productive French Institute is clearly CNRS, followed by University of Paris, while for the USA it is Harvard and University of Michigan.
2.2.2 Top 10 collaborations
The top 10 USA-France collaborations in terms of absolute number of joint papers can be seen in table 2
|California Institute of Technology||Centre national de la recherche scientifique (CNRS) 2678|
|Massachusetts Institute of Technology||Centre national de la recherche scientifique (CNRS) 2226|
|Harvard University||Centre national de la recherche scientifique (CNRS) 2106|
|University of Maryland, College Park||Centre national de la recherche scientifique (CNRS) 2092|
|Harvard University||University of Paris 2061|
|Ohio State University||Centre national de la recherche scientifique (CNRS) 1840|
|Harvard University||French Institute of Health and Medical Research 1765|
|University of Wisconsin-Madison||Centre national de la recherche scientifique (CNRS) 1742|
|Princeton University||Centre national de la recherche scientifique (CNRS) 1638|
|University of Michigan||Centre national de la recherche scientifique (CNRS) 1596|
For each of the 10 collaborations we report the topical distribution in the figures below. Some observations:
The collaboration between CalTech and CNRS focuses mainly in math,chemistry and physics, since both are widely known for their excellence in these fields.
The joint works of CNRS with MIT and Harvard are predominantly related to medicine. In contrast, the secondary fields in MIT are computer science, chemistry, physics and math, while for Harvard biology is second and the rest follow. This makes sense because, as mentioned above, MIT is focusing more on science and technology while Harvard has a broader scope.
University of Maryland (UMD) and CNRS seem to collaborate in a variety of disciplines, with physics being the dominant.
It is clear the main joint papers between Harvard and University of Paris refer to medical studies. The same applied to the French Institute of Health and Medical Research, which overall might refer to joint works between these three institutes.
CNRS has also collaborated extensively with Ohio State, Wisconsin-Madison and Michigan, especially in medicine.
3 Future Work
These initial results indicate there is an intensive and increasing scientific production in terms of joint papers and resulting citations. Overall, several new hypotheses can be tested:
Prediction of new collaborations based on the fields the institutes belong to or the venues they publish at.
Prediction of institutes that will increase their long-term impact based on their current position and activity in the network .
Use of complementary datasets such as OpenAIRE  to include information about funded projects.
This analysis was performed after the request of Dr. Yves Frenot, Dr. Jean-Baptiste Bordes and Maxime Benallaoua from the Office for Science and Technology of the Embassy of France in the United-States, with whom we collaborated to set the hypotheses examined.
-  (2012) Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc. Physics Letters B 716 (1), pp. 1–29. Cited by: §2.2.
-  (2012) Higgs boson mass and new physics. Journal of High Energy Physics 2012 (10), pp. 140. Cited by: §2.2.
-  (2019) Rooted citation graphs density metrics for research papers influence evaluation. Journal of Informetrics 13 (2), pp. 757–768. Cited by: 1st item.
-  (2008) Coauthorship networks and institutional collaboration patterns in reproductive biology. Fertility and sterility 90 (4), pp. 941–956. Cited by: §1.
-  (2004) The scientific impact of nations. Nature 430 (6997), pp. 311. Cited by: §1.
-  (2010) An infrastructure for managing ec funded research output-the openaire project. The Grey Journal (TGJ): An International Journal on Grey Literature 6 (1). Cited by: 4th item.
-  (1996) Studying research collaboration using co-authorships. Scientometrics 36 (3), pp. 363–377. Cited by: §1.
Multi-task learning for influence estimation and maximization. IEEE Transactions on Knowledge and Data Engineering. Cited by: 1st item.
-  (2017) Detecting rising stars in dynamic collaborative networks. Journal of Informetrics 11 (1), pp. 198–222. Cited by: 3rd item.
-  (2019) Scientometrics for success and influence in the microsoft academic graph. In International Conference on Complex Networks and Their Applications, pp. 1007–1017. Cited by: 1st item.
-  (2017) Citation analysis of scientific categories. Heliyon 3 (5), pp. e00300. Cited by: 3rd item.
-  (1930) La graisse du sang et la graisse du lait pendant la lactation. Le Lait 10 (96), pp. 601–613. Cited by: §2.2.
-  (2015) An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pp. 243–246. Cited by: §1.
-  (2005) Mapping the network of global science: comparing international co-authorships from 1990 to 2000. International journal of Technology and Globalisation 1 (2), pp. 185–208. Cited by: §1.