Link prediction for interdisciplinary collaboration via co-authorship network

03/16/2018 ∙ by Haeran Cho, et al. ∙ University of Bristol 0

We analyse the Publication and Research (PURE) data set of University of Bristol collected between 2008 and 2013. Using the existing co-authorship network and academic information thereof, we propose a new link prediction methodology, with the specific aim of identifying potential interdisciplinary collaboration in a university-wide collaboration network.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Interdisciplinarity has come to be celebrated in recent years with many arguments made in support of interdisciplinary research. Rylance (2015) noted that

  • complex modern problems, such as climate change and resource security, require many types of expertise across multiple disciplines;

  • scientific discoveries are more likely to be made on the boundaries between fields, with the influence of big data science on many disciplines as an example; and

  • encounters with others fields benefit single disciplines and broaden their horizons.

In 2015, UK higher education funding bodies and Medical Research Council commissioned a quantitative review of interdisciplinary research (Elsevier, 2015), as part of the effort to assess the quality of research produced by UK higher education institutions and design the UK’s future research policy and funding allocations. Around the same time, Nature published a special issue (Nature, 2015), reflecting the increasing trend of interdisciplinarity. One such example is observed in publication data, where more than one-third of the references in scientific papers point to other disciplines; also, an increasing number of research centres and institutes established globally, bringing together members of different fields, in order to tackle scientific and societal questions that go beyond the boundary of a single discipline (Ledford, 2015).

As a way of promoting interdisciplinary research, Brown et al. (2015) suggested ‘the institutions to identify research strengths that show potential for interdisciplinary collaboration and incentivise it through seed grants’. Faced with the problem of utilising limited resources, decision makers in academic organisations may focus on promoting existing collaborations between different disciplines. However, it could also be of interest to identify the disciplines that have not yet collaborated to this date but have the potential to develop and benefit from collaborative research given the nurturing environment.

Thus motivated, the current paper has a twofold goal: from the perspective of methodological development, we introduce new methods for predicting edges in a network; from the policy making perspective, we provide decision makers a systematic way of introducing or evaluating calls for interdisciplinary research, based on the potential for interdisciplinary collaboration detected from the existing co-authorship network. In doing so, we analyse the University of Bristol’s research output data set, which contains the co-authorship network among the academic staff and information on their academic membership, including the (main) disciplines where their research lies in.

Link prediction is a fundamental problem in network statistics. Besides the applications to co-authorship networks, link prediction problems are of increasing interests for friendship recommendation in social networks (e.g. Liben-Nowell and Kleinberg, 2007), exploring collaboration in academic contexts (e.g., Kuzmin et al., 2016; Wang and Sukthankar, 2013), discovering unobserved relationships in food webs (e.g., Wang et al., 2014), understanding the protein-protein interactions (e.g., Martínez et al., 2014) and gene regulatory networks (e.g., Turki and Wang, 2015), to name but a few. Due to the popularity of link prediction in a wide range of applications, many efforts have been made in developing statistical methods for link prediction problems. Liben-Nowell and Kleinberg (2007), Lü and Zhou (2011) and Martínez et al. (2016), among others, are some recent survey papers on this topic. The methods developed can be roughly categorised into model-free and model-based methods.

Among the model-free methods, some are based on information from neighbours (e.g., Liben-Nowell and Kleinberg, 2007; Adamic and Adar, 2003; Zhou et al., 2009) to form similarity measures and predict linkage; some are based on geodesic path information (e.g., Katz, 1953; Leicht et al., 2006); some use the spectral properties of adjacency matrices (e.g. Fouss et al., 2007). Among the model-based methods, some exploit random walks on the graphs to predict future linkage (e.g., Page et al., 1999; Jeh and Widom, 2002; Liu and Lü, 2010); some predict links based on probabilistic models (e.g., Geyer, 1992)

; some estimate the network structure via maximum likelihood estimation

(e.g., Guimerá and Sales-Pardo, 2009); others utilise the community detection methods (e.g., Clauset et al., 2008).

The link prediction problem in this paper shares similarity with the above mentioned ones. However, we also note on the fundamental difference, that we collect the data at the level of individual researchers for the large-size network thereof, but the conclusion we seek is for the small-size network with nodes representing the individuals’ academic disciplines, which are given in the data set. Nodes of the small-size network are different from communities: memberships to the communities are typically unknown and the detection of community structure is often itself of separate interest, whereas academic affiliations, which we use as a proxy for academic disciplines, are easily accessible and treated as known in our study.

The rest of the paper is organised as follows. Section 2 provides a detailed description of the publication and research data set collected at the University of Bristol, as well as the networks arising from the data. In Section 3, we propose a link prediction algorithm, compare its performance in combination with varying similiarity measures for predicting the potential interdisciplinary research links via thorough study of the co-authorship network, and demonstrate the good performance of our proposed method. Section 4 concludes the paper. Appendix provides additional information about the data set.

2 Data description and experiment setup

2.1 Data set

Publication and Research (PURE) is an online system provided by a Danish company Atira. It collects, organises and integrates data about research activity and performance. Adopting the PURE data set of research outputs collected between 2008 and 2013 from the University of Bristol (simply referred to as the ‘University’), we focus on journal outputs made by academic staff. Each of research outputs and members of academic staff has a unique ID. The data set also includes the following information:

  • Outputs’ titles and publication dates;

  • Authors’ publication names, job titles, affiliations within the University;

  • University organisation structures: there are 6 Faculties and each Faculty has a few Schools and/or Centres (see Tables 1 and 3 in Appendix). We will refer to the Schools and Centres as the School-level organisations, or simply Schools, in the rest of the paper.

Journal information is not provided in the data set, but we obtained this information using rcrossref (Chamberlain et al, 2014).

In summary, we have

  • 2926 staff, 20 of which have multiple Faculty affiliations, and 36 of which have multiple School-level affiliations;

  • 20740 outputs, including 3002 outputs in Year 2008, 3084 in 2009, 3371 in 2010, 3619 in 2011, 3797 in 2012, and 3867 in 2013.

See Figure 1 for the breakdown of the academic staff and their publications with respect to the Schools.

UNIV
FSCI FSSL FMVS FOAT FMDY FENG

GELY

GEOG

PSYC

MATH

PHYS

EDUC

LAWD

SPAI

PHPH

BIOC

PANM

MODL

HUMS

SART

VESC

SOCS

ORDS

SSCM

QUEN

MVEN

EENG

CHEM

BISC

NSQI

SCIF

EFIM

SPOL

SSLF

MSAD

MVSF

LANG

ARTF

MEED

CHSE

MDYF

GSEN

ENGF

Table 1: Organisation hierarchy structure within the University, full names of which can be found in Table 3 in Appendix.
Figure 1: Barplot of the number of staff (magnitudes given in the left -axis) and publications (right -axis) from the academic organisations listed in Table 1.

Note that this data set only includes all the authors within the University, i.e., if a paper has authors outside the University, (disciplines of) these authors are not reflected in the data set nor the analysis conducted in this paper. Also, we omit from our analysis any contribution to books and anthologies, conference proceedings and software. In Summer 2017, the University has re-named the Schools in the Faculty of Engineering and Faculty of Health Sciences, and merged SOCS and SSCM as Bristol Medical School (see Table 3). In this paper, we keep the structure and names used for the data period.

2.2 Experiment setup and notation

In order to investigate the prediction performance of the proposed methods, we split the whole data set into training and test sets, which contain the research outputs published in Years 2008–2010 and Years 2011–2013, respectively.

Denote by and the collections of all the staff (researchers) and all the School-level organisations appearing in Years 2008–2013, respectively. Also, let denote the collection of all the journals in which the researchers in have published during the same period. Three types of networks arise from the PURE data set.

  • Co-authorship network: the nodes are individual researchers (), and the edges connecting pairs of researchers indicate that they have joint publications.

  • Researcher-journal network: in this bipartite network, the nodes are researchers () and journals (), and there is an edge connecting a researcher and a journal if the researcher has published in the journal.

  • School network: the nodes are School-level organisations (), and the edges connecting pairs of organisations indicate that they have collaboration in ways which are to be specified; we wish to predict links in this network.

The co-authorship adjacency matrices for the training and test sets are denoted by , both of which are based on the same cohort of researchers. To be specific, for ,

Similarly, we define the incidence matrices corresponding to the research-journal bipartite networks for the training and test sets , as

for and .

For a researcher , let be the School-level affiliation of researcher . At the School-level, we create collections of edges (collaboration) and for the training and test sets, respectively, with

i.e., we suppose that there is an edge connecting a pair of organisations if they have joint publications in the corresponding data sets. Note that since are symmetric, the edges in are undirected ones.

Then, denotes the collection of new School-level collaborative links appearing in the test set only. In this data set, there are 260 pairs of Schools which have no collaborations in the training set, and new pairs of Schools which have developed collaborations in the test set. Our aim is to predict as many edges in as possible using the training set, without incurring too many false positives. We would like to point out that false positives can also be interpreted as potential collaboration which has not be materialised in the whole data set.

3 Link prediction

3.1 Methodology

We formulate the problem of predicting potential interdisciplinary collaboration in the University as School network link prediction problem, by regarding the academic affiliations as a proxy for disciplines. We may approach the problem

  • by observing the potential for future collaboration among the individuals and then aggregating the scores according to their affiliations for link prediction in the School network, or

  • by forming the School network based on the existing co-authorship network (namely, ) and predicting the links thereof.

Noting that interdisciplinary research is often led by individuals of strong collaborative potential, we adopt the approach in (i) and propose the following algorithm.

Link prediction algorithm
Step 1

Obtain the similarity scores for the pairs of individuals as using the training data.

Step 2

Assign weights to the edges in the School network by aggregating for with and with .

Step 3

Select the set of predicted edges as

for a given threshold .

Note that, although we can compute the edge weights for the pairs of individuals (and hence for the pairs of Schools) with existing collaborative links in Steps 1–2, they are excluded in the prediction performed in Step 3.

We propose two different methods for assigning the similarity scores to the pairs of individual researchers in Step 1, and aggregating them into the School network edge weights in Step 2. We first compute using the co-authorship network only (Section 3.1.1), and explore ways of further integrating the additional layer of information by adopting the bipartite network between the individuals and journals (Section 3.1.2).

3.1.1 Similarity scores based on the co-authorship network

As noted in Clauset et al. (2008), neighbour- or path-based methods have been known to work well in link prediction for strongly assortative networks such as collaboration and citation networks. If researchers A and B have both collaborated with researcher C in the past, it is reasonable to expect the collaboration between A and B if they have not done so yet. In the same spirit, one can also predict linkage based on other functions of neighbourhood.

Motivated by this observation, we propose different methods for calculating the similarity scores in Step 1. In all cases, if and only if does not have a length-2 geodesic path based on .

  • Length-2 geodesic path. Set if there is a length-2 geodesic path connecting and based on .

  • Number of common direct neighbours. Let be the number of distinct length-2 geodesic paths linking and based on , i.e.,

    where .

  • Number of common order-2 neighbourhood. Let be the number of common order-2 neighbours of and ; in other words,

  • Sum of weights of path edges. Let be the sum of the weights of all the length-2 geodesic paths linking and , i.e., listing all length-2 geodesic paths connecting and as , , we set

All (a)–(d) assign positive weights to the pairs of individuals who do not have direct collaboration in the training data set, but have at least one common co-author. Compared to (a), the other three scores integrate more information and take into consideration the number of common publications or the number of common co-authors; however, all (a)–(d) assign non-zero weights to the same set of edges. Then, with the thus-chosen edge weights between the researchers, we obtain the edge weights for the School network in Step 2, as

which in turn is used for link prediction in Step 3. In combination with (a)–(d), we propose to select the threshold in Step 3 as the th percentile of for a given .

3.1.2 Similarity scores based on the bipartite network

In the research output dataset, we have additional information, namely the journals in which the research outputs have been published, which can augment the co-authorship network for School network link prediction. Our motivation comes from the observation that when researchers from different organisations publish their research outputs in the same (or similar) journals but have not collaborated yet to this date, it indicates that they have the potential to form interdisciplinary collaboration with each other. A similar idea has been adopted in e.g., Kuzmin et al. (2016) for identifying the potential for scientific collaboration among molecular researchers, by adding the layer of the paths of molecular interactions to the co-authorship network.

Recall the incidence matrix for the researcher-journal bipartite network in the training set, . In the bipartite network, we define the neighbours of the researcher as the journals in which has published, and denote the set of neighbours by . Analogously, for journal , its neighbours are those researchers who have published in the journal, and its set of neighbours is denoted by .

Then, we propose the following scores to be used in Step 1 for measuring the similarity between two researchers and . Where there is no confusion, we omit ‘train’ from the superscripts of , and .

Jaccard’s coefficient:

The Jaccard coefficient that measures the similarity between finite sets, is extended to compare the neighbours of two individual researchers as

This definition simply counts the number of journals shared by and , and hence gives more weights to a pair of researchers who e.g., each published one paper in two common journals, than those who published multiple papers in a single common journal, given that remains the same. Therefore, we propose a slightly modified definition which takes into account the number of publications:

Adamic and Adar (2003):

The rarer a journal is (in terms of total publications made in the journal), two researchers that share the journal may be deemed more similar. Hence we adopt the similarity measure originally proposed in Adamic and Adar (2003) for measuring the similarity between two personal home pages based on the common features, which refines the simple counting of common features by weighting rarer features more heavily:

Co-occurrence:

We note the resemblance between the problem of edge prediction in a co-authorship network and that of stochastic language modelling for unseen bigrams (pairs of words that co-occur in a test corpus but not in the training corpus), and adapt the ‘smoothing’ approach of Essen and Steinbiss (1992). We first compute the similarity between journals using and augment the similarity score between a pair of researchers by taking into account not only those journals directly shared by the two, but also those which are close to those journals:

The use of above similarity measures and others have been investigated by Liben-Nowell and Kleinberg (2007) for link prediction problems in social networks. Here, we accommodate the availability of additional information beside the direct co-authorship network, and re-define the similarity measures accordingly.

Since the above similarity measures do not account for the path-based information in the co-authorship network, we propose to aggregate the similarity scores and produce the School network edge weights (Step 2) as

(1)

for a given , where denotes the geodesic distance between researchers and in . As an extra parameter is introduced in computing , we propose to select the threshold in Step 3 such that only those , whose edge weights exceed the median of the weights for the collaborative links that already exist in the training set, are selected in .

3.2 Results

In Table 2, we perform link prediction following Steps 1–3 of the link prediction algorithm on the PURE data set, using different combinations of the weights (a)–(d) and the threshold chosen with as described in Section 3.1.1, and similarity scores introduced in Section 3.1.1 together with for (1), where NA refers to the omission of thresholding on the geodesic distance . For evaluating the quality of the predicted links, we report the total number of predicted edges, their prediction accuracy and recall, which are defined as

prediction accuracy
recall

following the practice in the link prediction literature (see Liben-Nowell and Kleinberg (2007)). Each method is compared to random guessing, the prediction accuracy of which is defined as the expectation of prediction accuracy of randomly picking pairs from all non-collaborated pairs in the training data.

Section 3.1.1 Section 3.1.2 comm. detect.
(a) (b) (c) (d)
# of edges 1 80 80 80 80 NA 43 45 44 33 28 5 31
accuracy .338 .338 .338 .338 .488 .489 .432 .606 .679 0.129
recall .365 .365 .365 .365 .284 .298 .257 .270 .257 0.054
# of edges 0.4 49 32 32 33 20 18 26 26 17 6 25
accuracy .388 .500 .469 .424 .650 .667 .615 .769 .824 0.160
recall .257 .216 .203 .189 .176 .162 .217 .270 .189 0.054
# of edges 0.3 24 24 24 25 10 18 18 23 27 17 7 24
accuracy .541 .583 .500 .480 .667 .722 .652 .704 .824 0.166
recall .176 .189 .162 .162 .162 .176 .203 .257 .189 0.050
# of edges 0.2 24 16 16 21 4 4 4 5 16 5 8 21
accuracy .541 .625 .586 .523 .500 .750 .800 .688 .800 0.095
recall .176 .135 .122 .149 .027 .041 .054 .149 .054 0.027
random guess accuracy:
Table 2: Summary of the links predicted with the similarity measures and the thresholds chosen with as described in Section 3.1.1, and those described in Section 3.1.2 with , in comparison with the links predicted by a modularity-maximising community detection method (comm. detect.) with varying number of communities . There are 37 pairs of Schools which have developed new collaborations in the test set, out of 260 pairs that have no collaborations in the training set.

In Figure 2, we present the edges predicted with the similarity scores based on the co-authorship network with , and in Figure 3 those predicted with the similarity scores based on the bipartite network and , in addition to the one returned with and . Different node colours represent different Faculties to which Schools belong, and edge width is proportional to the edge weights obtained in Step 2 of the proposed algorithm.

Figure 2: Edges predicted indicating possible collaboration among School-level organisations using various weights (a)–(d) described in Section 3.1.1 and threshold . Each node represents a School and each Faculty has a unique colour. Each plot reports the prediction accuracy and the number of total edges returned. The edge width is proportional to the edge weights in Step 1.
Figure 3: Edges predicted indicating possible collaboration among School-level organisations using various similarity scores and (in parentheses) described in Section 3.1.2. See Figure 2 for details about each graph.

Table 2 shows that the performance of the link prediction algorithm, combined with the similarity scores based on the co-authorship network, is not sensitive to the choice of the weights (a)–(d) nor the threshold (): all 16 combinations outperform the random choice, and do not differ too much among themselves. Only counting the length-2 geodesic path pairs, the score (a) predicts the most edges among them, and when no thresholding is applied (), all (a)–(d) select the same cohort of edges. From Figure 2, it is observable that the four similarity scores still differ by preferring different edges. For instance, with (b) and (c), the edge between SSCM and GEOG is assigned a relatively larger weight than when (a) is used.

It is evident that by taking into account the additional layer of information on journals enhances the prediction accuracy considerably, returning a larger proportion of true positives among a fewer number of predicted edges in general (thus fewer false positives). In particular, combining the similarity measure , which takes into account the similarity among the journals as well, with the choice returns a set of predicted edges that is comparable to the set of edges predicted with the scores from Section 3.1.1 in terms of its size, while achieving higher prediction accuracy and recall. Among possible values for , most scores perform the best with , which aggregates the similarities between two individuals in forming School network edge weights, provided that their geodesic distance in the co-authorship network is less than 10; an exception is , where slight improvement is observed with .

For comparison, Table 2 also reports the results from applying a modularity-maximising hierarchical community detection method to the School network constructed from . Here, we assign an edge between Schools and , with the number of publications between the researchers from the two Schools as its weight, and the prediction is made by linking all the members (Schools) in the same communities. Modularity optimision algorithms are known to suffer from the resolution limit, and strong connections among a small number of nodes in large networks are not well detected by such methods (Fortunato and Barthelemy, 2007; Alzahrani and Horadam, 2016). Noting the nature of interdisciplinary research collaboration, which is often driven by a small number of individuals, we choose to apply the community detection method to the School network of smaller size rather than to the co-authorship network, following the approach described in (ii) at the beginning of Section 3.1.

The optimal cut results in 21 different communities at the School level, which leads to too few predicted edges. We therefore trace back in the dendrogram and show the results corresponding to the cases in which there are 5–8 communities. It is clearly seen from the outcome that our proposed method outperforms the community detection method regardless of the choice of similarity scores or other parameters. In fact, community detection often performs worse than random guessing in link prediction. This may be attributed to modularity maximisation assuming all communities in a network to be statistically similar (Newman, 2016) whereas the PURE data set is highly unbalanced with regards to both the numbers of academic staff and publications at different Schools, see Figure 1. On the other hand, our proposed method observes the potential for collaborative research at the individual level and then aggregates the resulting scores to infer the interdisciplinary collaboration potential, and hence can predict the links between e.g., a relatively small organisation (BIOC) and a large one (SSCM) as well as that between BIOC and another organisation of similar size (PSYC), see the bottom right panel of Figure 3.

Our proposed method predicts edges which do not appear in the test data set. On one hand, this can be interpreted as false positive prediction but on the other, it may be due to the time scale limitation, i.e., these edges may appear after Year 2013, or the Schools connected still have the potential to form collaborative links which are yet to be realised.

Figure 4 shows both the predicted edges (solid) and those which are in but not among the predicted ones (false negatives, dashed). Edge width is proportional to the corresponding weight for . For the false negative edges, we assign a very small value (0.2) as their edge weights and add 0.2 to all other edge weights to make the visualisation possible. In addition, we use weights computed in the same manner but with the test data to colour the edges: the bluer an edge is, the greater the association is between the pairs of Schools connected in the test set, while the red edges indicate weaker association; grey ones are falsely predicted ones (). In the figure, many of the predicted edges are more towards blue on the colour spectrum, while the majority of missing edges are in red, implying that the methodology is able to identify the pairs of Schools that develop significant collaboration in the test period.

Figure 4: Edges in . Blue and red edges are in , and the bluer an edge is, the larger the corresponding weight that is computed using the test set; the redder an edge is, its test set weight is smaller. The edges in are in grey. The edges in are solid lines and their widths are proportional to , and the ones in are dashed lines. The left panel is based on the similarity score (c) with described in Section 3.1.1, and the right panel is based on with as described in Section 3.1.2.

4 Discussion

In this paper, we tackle the problem of predicting potential interdisciplinary research by transforming it to a membership network link prediction problem. Two types of similarity scores have been proposed in this paper, one employing only the co-authorship network and the other integrating additional information which is naturally available for the research output data. As expected, when we have more information in hand, the prediction accuracy improves. Within each type of scores, different choices of scores or parameters do not differ by much in their performance when applied to the PURE data set. However, this does not guarantee that the same robustness can be expected when different data sets are used.

We would like to suggest that the practitioners make their own choice according to the aim of the analysis, and different behaviours of different metrics used may reflect the underlying properties of specific data set. For example, when using the co-author relationship only, if we also care about the amount of joint publications, then the similarity score (b) is more suitable. When additional information is available, returns the best prediction accuracy by taking into account not only those journals directly shared by two individuals, but also the journals which are similar to them. Also, the scores proposed in Section 3.1.2 tend to return fewer edges and, consequently, fewer false positives which, for some applications, may be a more important criterion than the measure of prediction accuracy used in this paper.

We would also like to point out one main limitation of this paper. The problem here is to predict linkage between disciplines within a university. However, due to the lack of information, it is not possible to map all individuals to disciplines and therefore we equate disciplines with academic organisations within the university. In most situations, this remedy works well, especially in traditional disciplines such as civil engineering, pure mathematics and languages, among others, which are all categorised well within the School framework. Relatively newer disciplines, however, do not have clear School boundaries, e.g., there are statisticians working in the School of Mathematics, School of Social and Community Medicine and School of Engineering. This situation on the other hand, also means mathematics, public health and engineering have shared interests in the modern world.

Finally, the paper focuses on predicting academic collaboration links from the co-authorship network but we would like to point out that the proposed method and similarity scores per se are not limited to a single organisation or, indeed, an application area. For example, we may suggest interaction between different communities based on their members’ Facebook networks, using both Facebook friend lists and additional information such as their taste in music or films.

Acknowledgements

We thank the PURE team and the Jean Golding Institute at the University of Bristol for providing the data set. We thank Professor Jonathan C. Rougier for all the constructive discussions, comments and his input in the data analysis. We also thank the Editor and the two referees for their constructive suggestions.

Appendix

We provide in Table 3 the full names of the academic organisations at the University of Bristol, supplementing Table 1.

UNIV University of Bristol
FENG Faculty of Engineering
MVEN Merchant Venturers’ School of Engineering
(changed to School of Computer Science, Electrical and
Electronic Engineering, and Engineering Mathematics)
QUEN Queen’s School of Engineering
(changed to School of Civil, Aerospace and Mechanical Engineering)
EENG Department of Electrical & Electronic Engineering
GSEN Graduate School of Engineering
ENGF Engineering Faculty Office
FMDY Faculty of Health Sciences
ORDS Oral & Dental Sciences
SOCS Clinical Sciences (changed to Population Health Sciences)
SSCM Social and Community Medicine (changed to Translational Health Sciences)
VESC Veterinary Sciences
MDYF Health Sciences Faculty Office
MEED Centre for Medical Education
CHSE Centre for Health Sciences Education
FMVS Faculty of Biomedical Sciences
BIOC Biochemistry
PANM Cellular and Molecular Medicine
PHPH Physiology, Pharmacology & Neuroscience
MVSF Biomedical Sciences Faculty Office
MSAD Biomedical Sciences Building
FOAT Faculty of Arts
HUMS Humanities
MODL Modern Languages
SART Arts
ARTF Faculty Office Arts Faculty Office
LANG Centre for English Language and Foundation Studies
FSCI Faculty of Science
BISC Biological Sciences
CHEM Chemistry
GELY Earth Sciences
GEOG Geographical Sciences
MATH Mathematics
PHYS Physics
PSYC Experimental Psychology
NSQI Centre for Nanoscience and Quantum Information
SCIF Science Faculty Office
FSSL Faculty of Social Sciences and Law
EDUC Graduate School of Education
EFIM Economics, Finance and Management
LAWD University of Bristol Law School
SPAI Sociology, Politics and International Studies
SPOL Policy Studies
SSLF Social Sciences and Law Faculty Office
Table 3: Abbreviations and full names of the academic organisations

References

  • Adamic and Adar (2003) Adamic, L. A. and Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25, 211–230.
  • Alzahrani and Horadam (2016) Alzahrani, T., & Horadam, K. J. (2016). Community detection in bipartite networks: Algorithms and case studies. In Complex Systems and Networks, 25–50. Springer, Berlin, Heidelberg.
  • Brown et al. (2015) Brown, R. R., Deletic, A. and Wong, T. H. F. (2015). How to catalyse collaboration. Nature, 525, 315–317.
  • Chamberlain et al (2014) Chamberlain, S., Boettiger, C., Hart, T. and Ram, K. (2014). rcrossref: R Client for Various CrossRef APIs. R package version 0.3.0 https://github.com/ropensci/rcrossref.
  • Clauset et al. (2008) Clauset, A., Moore, C. and Newman, M. E. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453, 98.
  • Elsevier (2015) Elsevier (2015). A review of the UK’s interdisciplinary research using a citation-based approach.
  • Essen and Steinbiss (1992) Essen, U. and Steinbiss, V. (1992) Cooccurrence smoothing for stochastic language modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 161–164.
  • Fortunato and Barthelemy (2007) Fortunato, S. and Barthelemy, M. (2007) Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104, 36–41.
  • Fouss et al. (2007) Fouss, F., Pirotte, A., Renders, J.-M. and Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19, 355–369.
  • Geyer (1992)

    Geyer, C. J. (1992). Practical Markov chain Monte Carlo.

    Statistical Science, 7, 473–483.
  • Guimerá and Sales-Pardo (2009) Guimerá, R. and Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106, 22073–22078.
  • Katz (1953) Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43.
  • Kuzmin et al. (2016) Kuzmin, K., Lu, X., Mukherjee, P. S., Zhuang, J., Gaiteri, C., & Szymanski, B. K. (2016). Supporting novel biomedical research via multilayer collaboration networks. Applied Network Science, 1, 11.
  • Jeh and Widom (2002) Jeh, G. and Widom, J. (2002). SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), ACM, 538–543.
  • Ledford (2015) Ledford, H. (2015). Tean science. Nature, 525, 308–311.
  • Leicht et al. (2006) Leicht, E. A., Holme, P. and Newman, M. E. J. (2006). Vertex similarity in networks. Physical Review E, 73, 026120.
  • Liben-Nowell and Kleinberg (2007) Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 58, 1019–31.
  • Liu and Lü (2010) Liu, W. and Lü, L. (2010). Link prediction based on local random walk. EPL (Europhysics Letters), 89, 58007.
  • Lü and Zhou (2011) Lü, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390, 1150–70.
  • Lyall et al. (2013) Lyall, C., Bruce, A., Marsden, W. and Meagher, L. (2013). The role of funding agencies in creating interdisciplinary knowledge. Science & Public Policy (SPP), 40, 62–71.
  • Martínez et al. (2014) Martńez, V., Cano, C. and Blanco, A. (2014). ProphNet: A generic prioritization method through propagation of information. BMC bioinformatics, 15, S5.
  • Martínez et al. (2016) Martínez, V., Berzal, F. and Cubero, J. C. (2016). A survey of link prediction in complex networks. ACM Computing Surveys (CSUR), 49, 69.
  • Mutz et al. (2015) Mutz, R., Bornmann, L. and Daniel, H.-D. (2015). Cross-disciplinary research: What configurations of fields of science are found in grant proposals today? Research Evalulation, 4, 30–36.
  • National Academy (2005) National Academy of Sciences, National Academy of Engineering and Institute of Medicine (2005). Facilitating Interdisciplinary Research. Washington, DC: The National Academies Press. doi:https://doi.org/10.17226/11153
  • Nature (2015) A special issue on interdisciplinary research (2015). Nature, 525, nature.com/inter
  • Newman (2016) Newman, M. E. J. (2016) Community detection in networks: Modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319.
  • Page et al. (1999) Page, L., Brin, S., Motwani, R. and Winograd, T. (1999) The PageRank citation ranking: bringing order to the web. Technical Report 1999–66. Stanford InfoLab.
  • Rylance (2015) Rylance, R. (2015). Global funders to focus on interdisciplinarity. Nature, 525, 313–315.
  • Turki and Wang (2015) Turki, T. and Wang, J. T. L. (2015). A new approach to link prediction in gene regulatory networks. International Conference on Intelligent Data Engineering and Automated Learning. Springer International Publishing, 404–415.
  • Wang et al. (2014) Wang, L., Hu, K. and Tang, Y. (2014). Robustness of link-prediction algorithm based on similarity and application to biological networks. Current Bioinformatics, 9, 246–252.
  • Wang and Sukthankar (2013) Wang, X. and Sukthankar, G. (2013). Link prediction in multi-relational collaboration networks. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, 1445–1447.
  • Weingart (2000) Weingart, P. (2000). Interdisciplinarity: The paradoxical discourse. Practising Interdisciplinarity, 25–41.
  • Woelert and Millar (2013) Woelert, P. and Millar, V. (2013) The ‘paradox of interdisciplinarity’ in Australian research governance. Higher Education, 66, 755–767.
  • Zhou et al. (2009) Zhou, T. Lü, L. and Zhang, Y. C. (2009). Predicting missing links via local information. European Physical Journal B, 71, 623–630.