Exploring the role and nature of interactions between Institutes in a Local Affiliation Network

10/04/2018 ∙ by Chakresh Kumar Singh, et al. ∙ IIT Gandhinagar 0

In this work, we have studied the collaboration and citation network between Indian Institutes from publications in American Physical Society(APS) journals between 1970-2013. We investigate the role of geographic proximity on the network structure and find that it is the characteristics of the Institution, rather than the geographic distance, that plays a dominant role in collaboration networks. We find that Institutions with better federal funding dominate the network topology and play a crucial role in overall research output. We find that the citation flow across different categories of institutions is strongly linked to the collaborations between them. We have estimated the knowledge flow in and out of Institutions and identified the top knowledge source and sinks.



There are no comments yet.


page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Academic institutes in a country are the biggest stake holders in the knowledge production, diffusion, and innovation. Institutions nurture the manpower and provide resources to conduct research. Cumulative effort of academic institutions, Industry, and government agencies is essential for building an efficient knowledge economy chen2017research ; laursen2011exploring . Studies suggest that SCImago that developed countries dominate with their share in total research output measured via publications and citations. However in recent years developing countries like India, Brazil, China etc have significantly increased their global share of research output. Exploring and understanding the major factors and policies leading to this accelerated growth is of interest to both academicians and policy makers  garfield1983mapping ; gupta2009status ; SCImago ; arunachalam1998science

Flow of scientific knowledge across people, institutions and countries through collaborations and citations determine the evolution of scientific discoveries and technological growth. Quantitative analysis of different forms of networks constructed from bibliometric data provide an insight into underlying structural and dynamic properties of scientific collaboration herrera2010mapping ; singh2018structure . In the last two decades, the rapid growth of network science and availability of large scale data on scientific publications has led to large scale studies on analysis of patterns of scientific collaboration and citations newman2001structure ; barabasi2002evolution . Analysis of the evolution of co-authorship and citation networks have largely focused on the interactions between individuals and Institutions at global level to explain the functioning of ecosystem of scientific collaboration mazloumian2013global ; dong2017century . These studies have shown broad features such as power law behavior of the collaboration networks newman2001structure , preferential attachment newman2001clustering , knowledge flow map mazloumian2013global , aging in collaboration strength and citations borner2004simultaneous ; hajra2005aging ; wang2013quantifying , and geographic proximity laursen2011exploring ; katz1994geographical ; pan2012world ; ma2014effect .

In this work, we focus on the collaboration and citations networks in American Physical Society (APS) journals with at least one author with an Indian affiliation. The motivation behind restricting to country specific study at mesoscopic Institution level is three fold. First, studies on large scale datasets in scientific collaboration networks at global level often masks the small scale dynamics that are specific to Institutions, cities, and countries. While large scale studies highlight the global average trend in network measures, small scale studies give deeper insight into nature of interactions between institutes that drive the collaborations mazloumian2013global ; pan2012world ; ma2014effect ; gasko2016new ; hou2007structure . Secondly, investigating the behavior of these networks at country level helps us to reveal multitude of factors such as type of Institutions, characteristic of the Institutions, and location of Institutions which influence collaborations. Thirdly, extracting the factors influencing collaborations are useful in framing of higher education and research policy, allocation and prioritization of resources at the Institutional level.

In this work, we have constructed different types of Networks representing collaboration between Institutions, citations flow between Institutions and broadly across the category of the Institutions. Using different network measures, we have analyzed the strength of collaboration between Institutions, importance of Institutions, constructed spatial network of collaborations, analyzed the role of geographical proximity in collaboration.

Ii Data

We use journal papers published by American Physical Society(APS) between 1970-2013 in journals Physical Review A-E, Physical Review Letters, and Review of Modern Physics. Since our study restricted to India, we have chosen all the articles such that there is at least one author with Indian affiliation. The total number of such papers was 14,704. From each of these articles we extract the affiliations of all the authors and extract the national origins for outside. We mark all the non-Indian affiliations in our subset as ’Foreign’ and only extract respective countries. For the Indian affiliations, we extract the Institute name, type of Institution, city and the pin-code.

We disambiguate the Institute affiliation naming and assign a unique ID. The disambiguation is done using string matching, edit distance measures to compare Institutes names, and manually checked for repetitions. This reduced the total number of distinct institutes from 7180 to 677. Out of the reduced set, we could map 628 institutes to their pin-code locations. After cleaning the data, we classified each institute based on the categories as described below, and constructed the following networks for our analysis.

We use the classification of Indian higher education Institutions by University Grants Commission (UGC) of India ugc , which are based on degree awarding category, managing bodies such as state, central or private, and sources of funding (see Table. 1). We also included special categories which are certified by UGC, but not given a standard category (such as Private Institutes and State Research Institutes).

Type of Institutes Acronym Function
National Research Institutes NRI Research Institutions funded by the central government
Institutes of National Importance INI Teaching (both UG and PG) and research Institutions, declared by as INI by Government of India
Central Universities CU Public Universities formed by Central Act.
State Universities SU Public Universities formed by State Act.
State Colleges SC Colleges affiliated to State Universities
Central Colleges CC Colleges affiliated to Central Universities
Deemed Universities DU Public or Private Universities which can award degrees on their own , and declared as deemed by UGC
Private Universities PU Universities established through a state or central act by a sponsoring body which can be a registered Society, Trust or Non-profit Company
Private Institutes PI Stand alone private Institutions recognized by government
State Research Institutes SRI These are research Institutions funded by the state government
Table 1: Categories of Institutions

Figure 1: Map of India’s Global collaboration based on the publications in APS journals between 1970-2013. Each red dot in within is an institute while outside India they represent capital cities of the respective countries.

Iii Methods

We have explored collaborations by constructing networks at the Institution level, its geographic location, and category. This allows us to explore the network properties at multiple scales by constructing super nodes from individual nodes. We also explore citations between these Institutions to assess the knowledge flow between Institutions and their category.
Construction of Networks
Institute Collaboration Network : We construct a weighted undirected networks with institutes as nodes, where the edge weights between two nodes , represent the number of co-authored pairs between these Institutions. In Fig.  1, we show the map of collaborations between Indian Institutions and different countries of the world.
Institute Citation Networks: Here, the weighted directed network is constructed with institutes as nodes, and for two nodes , the edge weight from , denotes the number of citations authors from have cited authors from .
Network based on Institution Type: Institutions of same type are clubbed into single super node, and network based on collaboration/citation between super-nodes are constructed as in Fig 4 and Fig 7.

To track the evolution of these networks, we construct cumulative graphs at one year time interval from 1970-2013. At a given time , the network will have information about all the collaboration or citation between the nodes from to .
Network Measures
We measure the normalized strength of collaboration between two institutes by  pan2012world , where is the number of common papers between nodes and , and and are the number of papers published individually by and respectively.To characterize the structural significance of nodes in the network we use three centrality measures: Betweenness, Average Degree, Clustering and Page-Rank centrality newman2010networks . The knowledge flow in and out of a node is measured in the Institute citation network as (a) and (b) where , are in-degree and out-degree of a node, and , are total incoming and outgoing weights respectively.

For our analysis, we performed measurements on the cumulative collaboration and citation networks between institutes up to 2013. The centrality value of each super node in every case is the average of values of its constituents. We measure the distance between two Institutions by measuring the Vincenti (great arc) between the pin-codes representing these Institutions. We club the distance in 50 km bins. Gephi ICWSM09154 software and Networkx hagberg2005networkx package in python were used for calculations and visualizations.

Iv Results

In our analysis, we have addressed four questions related to collaboration , affiliation, distance between Institutions, and type of Institution based on analysis of different types of network discussed in the methods section.
Does collaboration depend on Geographic proximity?
With the advancement in telecommunication and transportation technology it seems natural that communication has overcome the distance barrier freidman2005world ; graham1998end . However, studies have shown that geographic proximity still plays a role in establishing connections laursen2011exploring ; ma2014effect ; pan2012world . In our study we address this question by measuring change in frequency of collaboration and strength of collaboration vs. distance between Institutions.

Figure 2: Dependence of the strength of collaboration with geographic distance between Institutes (a- Top) Box of the collaboration strength versus distance (in multiples of 50 kms) (b-bottom left) Frequency of collaboration with distance (c-bottom right) Mean strength of collaboration versus distance. Panel (c) Average cumulative strength of collaboration (from 1970-2013) with distance in multiples of 50kms ( Note the change in y axis scale).

Fig.2 top panel shows the box plots of the strength of collaboration () as defined in SectionIII for different distance bins. Each bin is 50kms wide and data includes all the pairs such that . There is broad declining trend in the median of the normalized collaboration strength with distance. However, after the 31st bin (1500- 1550 kms), there is a surge in collaborations and then the trend is uneven. Bottom left panel shows the average strength of collaboration versus distance for different time periods, Panel (c) shows the cumulative strength of collaborations up to 2013 in log-linear scale. After , there is a big drop in . People collaborate mostly within their own Institutions and with people in their city. Afterwards, the collaborations broadly decrease, but there are many spikes in between, which is likely due to peaks in the pair correlation function between population of cities . There is no indication for a power law decay in with distance.

Figure 3: Comparing the Frequency of collaboration of SC,SU,NRI and INI’s with institutes of other categories denoted by -X. SC and SU have more local collaborations while NRI and INI’s have collaborations spread over wider distances.

To explain the variance in collaboration versus distance, we split the collaborations in to Institutional groups (categories) as in Table .

1 and study the frequency of collaborations between four different pair of groups and as in Fig.3. Here denotes all category of Institutions combined. The State Colleges (SC) and State Universities (SU) collaborate strongly with Institutions in the close proximity than farther cities (Top panels ). On the other hand National Research Institutions (NRI) and Institutes of National importance (INI) don’t show strong dependency on distance.

In all graphs we notice an increase in frequency of collaborations at distances between 750-1650km(15-35 bin). This is largely due to collaborations between Institutions located in highly populous metropolitan areas such as Delhi, Kolkata, Mumbai, Bangalore,and Chennai. The aerial distance between these cities lies in this range. We argue that the strength of collaboration between NRI and INI in major cites can be the reason for fluctuations in Fig. 2(c).
Does collaboration between Institutions depend on their productivity?
The number of publications by authors affiliated to an institute is a strong indicator of its research output. We hypothesize that collaboration strength depends on the category of Institutions and its productivity. We build network of Institutional category by creating super nodes from the individual nodes as described in the methods section.

In Table 2, we tabulate the number of papers, number of institutions, and papers per Institute in each category. Of all the publications in the dataset, NRI’s contribute to of papers followed by SU’s () and INI’s(). The total research productivity is highest for NRI (9292), followed by SU (3438) and INIs (2635). The average productivity is (papers per Institute) is highest for NRIs (122.3) followed by CU (65.1) and NRIs (57.3).

Papers 9292 2635 2083 3438 1482 1 9 85 57 25
Institutions 76 46 32 109 301 1 4 18 19 6
Papers per Institute 122.3 57.3 65.1 31.5 4.9 1 2.25 4.7 2.68 4.17
Table 2: Number of Papers from Different types of Institutes in the dataset studied till 2013

In Fig.4, we show the collaboration network between Institution categories (panel a) and their corresponding weighted adjacency matrix (panel (b). In panel (a), the size of the node represents the total publications. Edge width shows the number of collaborations between authors of the Institutions. Groups are arranged according to the decreasing order of their productivity measured in papers per Institution in the category. We see that the highly productive groups in the top left corner collaborate most among themselves. The NRI, CU, and INI lead in relative contribution. Some premier institutions that fall in this category are Indian Institute of Science (IISc), Saha Institute of Nuclear Physics (SINP), Punjab University, Benaras Hindu University (BHU), Institute of Mathematical Sciences (IMSc), Tata Institute of fundamental Research (TIFR), and different Indian Institute of Technology (IITs). These institutes are mostly autonomous and are most favorable centers for pursuing higher education in India.

Figure 4: Collaboration between different types of Institutions (a) Network representation. Size of the node is proportional to the total number of papers published from institutes falling in the category as in1. Edge width is proportional to the number of co-authorship events.Self edges represent collaboration amongst institute of same kind. (b) Matrix representation of the collaboration of panel (a) , where the type of Institutions are sorted according to their productivity (as defined in the text).

Network structural differences across different Institutions and their types
In Fig. 5, we show the cumulative Institute collaboration network from APS publications in India as of 2013. The nodes are colored according to their category as in Fig. 4 (a) and spatially located based on their pin-codes.

Figure 5: Collaboration between Indian Institutes marked by their pin-codes in 2013. The Nodes are colored based on their type as in Fig. 4
Figure 6: Comparison of centrality measures for Institutes grouped into different categories.
Figure 7: Institutes clubbed as super nodes representing citations exchanged between different types of institutes.Size of the node is proportional to the total number of papers published from institutes falling in the category. Edge width is proportional to the number of citations exchanged.
Figure 8: Dominant institutes in the knowledge network constructed from the dataset. Size is proportional to weighted in-degree. All these institute are located in major cities of India acting as knowledge hubs.
Figure 9: Effective incoming and outgoing citations shared by each node. This represents the knowledge transferring( positive y-axis) and receiving( negative y-axis) capacity of every node in the network.Each category of institute is color coded

In Fig. 6, we compare four different measures: average degree, clustering coefficient, betweenness, and page rank for top five productive category of Institutions. These measures help us to assess the strength and dominant role of each category of Institutions within the network. Average degree tells us the average number of connections nodes, betweenness tells the centrality of a node in connecting different parts of the network, and page rank measures importance of node, and the Clustering defines the average connectivity of the neighborhood newman2010networks ; newman2001clustering .

NRI’s have the highest average degree, betweenness and page rank indicating their dominant position in collaboration network. Central Universities have highest average clustering coefficient, highlighting their role bringing different type of Institutions in collaborations. State colleges, though fare low in average degree, betweenness, and page rank, they tend to form highly clustered groups in the network.

Does knowledge flow across Institutions depend on the category of Institutions?
Citations are an indirect measure of the flow of ideas between authors. At an aggregate level, citations between Institutions is an indicator of the knowledge flow across them  mazloumian2013global . The knowledge flow network based on the citations exchanged between Institutions (see methods for details) between them is shown in Fig. 7 (a). The corresponding directed and weighted adjacency matrix between type of Institutions is shown in panel (b) of Fig. 7. Node size represent the total number of published. NRI category is the largest in the group and also shows the most incitations within group. The matrix shows, that maximum citations flow between high productive Institutions like NRI, CU, INI , SU, and SC. The pattern is similar what we observe in Fig. 4.

In Fig.8, we show the Giant Connected Component (GCC) for the knowledge network at Institutional level, and highlight the Institutes which receive high in citations. These can be considered as knowledge hubs in the Institutional network and are located in the major cities of India. Of all the nodes in the GCC, NRI’s, INI’s, CU’s and SU’s have nodes that act as knowledge centers. The biggest center for knowledge share is Tata Institute of Fundamental Research(NRI) based on the given dataset.

To compare the inward and outward flow of knowledge, we compute the effective in flow and outflow (see methods for details) measures for different Institutions in the GCC. The results split according to categories is shown in Fig. 9. We find that top knowledge sources also acts as knowledge sinks.

V Conclusion

To the best of our knowledge, this is the first study to map the collaboration and knowledge flow between institutions in India and their categories. We have compared whether the geographic scaling law (inverse distance) in scientific collaborations at global level are valid at local level or not. We do not find any strong evidence for inverse power law dependence in collaboration strength with respect to distance.

We have identified the type of Institutions which dominate the research output in India measured through number of papers, collaborations, and knowledge flow. We find that National Research Institutions (NRI), Central Universities (CU), and Institutes of National Importance (INI) dominate the research output in Physics based on APS dataset. The major cities in India like Delhi, Mumbai, Kolkata, Bangalore, Chennai are largest knowledge hubs for India followed by Kanpur, Allahabad, Ahmedabad and Bhubaneshwar. These cities are also known to host premier educational and research Institutions in the country. State Universities and state colleges collaborate closely with Institutions closer to them. While, National Institutions like NRIs and INIs have broad collaborations in all major cities across India. Highly productive Institutions collaborate more amongst each other and cite each others work more frequently. We identified leading Institutions which act as knowledge sources.

Our study was limited to Physics papers published in American Physical Society (APS) journals from 1970-2013 with at least one Indian affiliation. This does not cover the full spectrum of publications in India over different disciplines. Hence broad generalizations on the scientific out put and flow cannot be made. However results from our analysis are in agreement with reports that study India’s research output on a larger scale and give a reasonable idea about the existing knowledge network in India. We believe this study could be helpful for framing policies to promote research collaborations between institutes and sharing of resources. In future, we plan to scale this study to include large datasets and cover more indexed publications and implement network modeling to understand the dynamics behind observed evolution.


  • [1] Kaihua Chen, Yi Zhang, Guilong Zhu, and Rongping Mu. Do research institutes benefit from their network positions in research collaboration networks with industries or/and universities? Technovation, 2017.
  • [2] Keld Laursen, Toke Reichstein, and Ammon Salter. Exploring the effect of geographical proximity and university quality on university–industry collaboration in the united kingdom. Regional studies, 45(4):507–523, 2011.
  • [3] SJR SCImago Journal & Country Rank [Portal]. Retrieved on september 1st,2018 from http://www.scimagojr.com.
  • [4] Eugene Garfield. Mapping science in the third world. Science and public policy, 10(3):112–127, 1983.
  • [5] B Gupta and S Dhawan. Status of india in science and technology as reflected in its publication output in the scopus international database, 1996–2006. Scientometrics, 80(2):473–490, 2009.
  • [6] Subbiah Arunachalam, R Srinivasan, and Vidyalakshmi Raman. Science in india–a profile based on india’s publications as covered by science citation index 1989–1992. Current Science, 74(5):433–441, 1998.
  • [7] Mark Herrera, David C Roberts, and Natali Gulbahce. Mapping the evolution of scientific fields. PloS one, 5(5):e10355, 2010.
  • [8] Chakresh Kumar Singh and Shivakumar Jolad. Structure and evolution of indian physics co-authorship networks. arXiv preprint arXiv:1801.05400, 2018.
  • [9] Mark EJ Newman. The structure of scientific collaboration networks. Proceedings of the national academy of sciences, 98(2):404–409, 2001.
  • [10] Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. Evolution of the social network of scientific collaborations. Physica A: Statistical mechanics and its applications, 311(3-4):590–614, 2002.
  • [11] Amin Mazloumian, Dirk Helbing, Sergi Lozano, Robert P Light, and Katy Börner. Global multi-level analysis of the ‘scientific food web’. Scientific reports, 3:1167, 2013.
  • [12] Yuxiao Dong, Hao Ma, Zhihong Shen, and Kuansan Wang. A century of science: Globalization of scientific collaborations, citations, and innovations. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1437–1446. ACM, 2017.
  • [13] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical review E, 64(2):025102, 2001.
  • [14] Katy Börner, Jeegar T Maru, and Robert L Goldstone. The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Sciences, 101(suppl 1):5266–5273, 2004.
  • [15] Kamalika Basu Hajra and Parongama Sen. Aging in citation networks. Physica A: Statistical Mechanics and its Applications, 346(1-2):44–48, 2005.
  • [16] Dashun Wang, Chaoming Song, and Albert-László Barabási. Quantifying long-term scientific impact. Science, 342(6154):127–132, 2013.
  • [17] J Sylvan Katz. Geographical proximity and scientific collaboration. Scientometrics, 31(1):31–43, 1994.
  • [18] Raj Kumar Pan, Kimmo Kaski, and Santo Fortunato. World citation and collaboration networks: uncovering the role of geography in science. Scientific reports, 2:902, 2012.
  • [19] Haitao Ma, Chuanglin Fang, Bo Pang, and Guangdong Li. The effect of geographical proximity on scientific cooperation among chinese cities from 1990 to 2010. PloS one, 9(11):e111705, 2014.
  • [20] Noémi Gaskó, Rodica Ioana Lung, and Mihai Alexandru Suciu. A new network model for the study of scientific collaborations: Romanian computer science and mathematics co-authorship networks. Scientometrics, 108(2):613–632, 2016.
  • [21] Haiyan Hou, Hildrun Kretschmer, and Zeyuan Liu. The structure of scientific collaboration networks in scientometrics. Scientometrics, 75(2):189–202, 2007.
  • [22] University Grants Commission (UGC) website. https://www.ugc.ac.in/.
  • [23] M. Newman. Networks: An Introduction. OUP Oxford, 2010.
  • [24] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An open source software for exploring and manipulating networks, 2009.
  • [25] A Hagberg, D Schult, and P Swart. Networkx: Python software for the analysis of networks. Technical report, Technical report, Mathematical Modeling and Analysis, Los Alamos National Laboratory, 2005. http://networkx. lanl. gov, 2005.
  • [26] Thomas Freidman. The world is flat. New York: Farrar, Straus and Giroux, 488, 2005.
  • [27] Stephen Graham. The end of geography or the explosion of place? conceptualizing space, place and information technology. Progress in human geography, 22(2):165–185, 1998.