1 Introduction
New knowledge builds on previous knowledge: this is a central tenet of science. A publication relies on previous publications and cites them to acknowledge this debt [12]. Although citations acknowledge direct influences, the extent of the influence of a publication can go beyond these firstorder relations. The study of the influence of previous publications on new ones rests at the core of scientometrics. The visualization and quantification of such dependence has been termed “algorithmic historiography” by Eugene Garfield [6, 5]. A variety of tools have been developed for the purpose of facilitating such exploration [2, 22, 11, 20, 23]. Furthermore, previous literature has investigated methods to trace the historical development of science using citations [10, 21, 19] and text [7, 8, 18]. Our related goal here is to quantify citation influence, and thus give credit, beyond direct citations. In particular, we aim at understanding the interplay of first and higherorder influence across academic disciplines.
In this contribution we define higherorder citations as citations chains of arbitrary length among pairs of publications, and show how the higherorder citation matrix among disciplines can be computed in an iterative and efficient way. Our proposed method is related to the wellknown PageRank algorithm [1, 4, 24], but it is specifically focused on quantifying higherorder citation influence. We apply this novel definition to the Web of Science dataset between years 2000 and 2016 included (17,932,523 publications and 190,550,206 citations among them). We show that the contribution of firstorder (length 1) citations accounts for 58% of the whole higherorder citation flow, hence it misses a conspicuous part (42%) of citation information. Indeed, higherorder citations bring a clear picture of the relationships among disciplines [9]. Furthermore, we observe this added value by clustering disciplines into larger communities, finding disciplines that act as brokers among communities, and distinguishing between interdisciplinary and autarchic disciplines.
2 Methodology
Let be a citation network with nodes and directed edges . We assume the nodes represent publications. If publication cites publication , then . Normally, is a Directed Acyclic Graph (DAG), because citations only go from more recent publications to older publications.^{1}^{1}1There are some exceptions, but these can be removed so as to ensure that is a DAG. A simple example is depicted in Figure 1.
Let be the adjacency matrix, so that whenever cites , that is , and otherwise. Let be the outdegree of node , i.e., the number of publications referenced by publication within the citation network .
We then recursively define the dependence of publication on publication as the mean dependence of publications referenced by on publication :
We say that is the dependence of on , but on the same note it is the influence of on . Notice that the recursive equation has always a solution since recursion proceeds from each publication to its citing publications, and the graph is acyclic.
Let us label each edge of the graph
with probability
of going from to in a random walk on the graph. Given a path on the graph, we define the likelihood of the path asThe dependence , when , is then the sum of likelihoods of all paths from to in the graph. In general:
The dependence is large if there are numerous likely paths starting at and ending in .
For instance, with reference to the graph in Figure 1), we have:
We can write this more compactly using matrix notation. Let be a diagonal matrix such that if and otherwise. We can then write
(1) 
where is the identity matrix. We can solve for and obtain
Notice that, if we topologically sort the nodes in (as done in Figure 1), which is possible since is a DAG, then both and are triangular matrices. In particular, the diagonal elements of are equal to 1. Hence , the matrix is invertible and Equation (1) has a solution, as noticed above. The inverse is also triangular.
One can also iteratively compute using the fact that:
(2) 
where is the longest path is the graph and is the number of nodes of . The last equality holds because is acyclic and thus for all . We expect . In particular, the length is bounded by the longest path in the dataset, which corresponds to the number of time instants in the granularity of the dataset. For instance, if the dataset covers 10 years and publication dates are given with a month granularity, then is lower than .
Matrix computes the dependence contribution of paths of length in graph . In particular, for , the matrix represents firstorder citations, that is direct citations among publications. On the other hand, matrix for , encodes higherorder citations, that is chains of citations of length among publications.
Notice that if and only if there exists at least on path from to in graph . Hence, matrix has the same nonzero pattern of the adjacency matrix of the transitive closure of . We thus expect to be denser than .
2.1 Discipline dependence
Instead of looking at the individual dependence of publication on publication , we are interested in disciplinary dependencies. In particular, we are interested in the dependence of a publication (or of a discipline) on a discipline.
Let us denote by the extent to which publication belongs to discipline , hence is a matrix , where is the number of publications and is the number of disciplines. For the nonoverlapping case, if publication belongs to discipline . A publication can belong to multiple disciplines, thus for possibly more than a single discipline . In either case, we have and .
The dependence of publication on discipline can then be defined as the sum of the dependencies of publication on articles in :
or, in matrix notation
Note that
We can hence iteratively compute matrix without materializing matrix :
Notice that is the dependence contribution of citation paths of length up to . Hence
where is the longest path in the graph, and the iterative computation of can stop after steps. Although can be as dense as , it has size , which is more manageable than the size of , which is , since we expect .
As a particular case, the dependence of publication on the whole network is , that is, . We thus have that:
Recall that the Pagerank of , with damping factor
and exogenous vector
, is the vector such that [14]. Hence, interestingly, the dependence vector is also the Pagerank of with damping factor and exogenous vector .One can also define the the dependence of discipline on publication as the sum of the dependence of publications in on article :
or, in matrix notation
Notice that since , then and hence . It follows that and also can be computed iteratively.
The dependence of discipline on discipline is the sum of the dependence of papers in on papers in , that is:
or, in matrix notation
We also define , for , as the citation flow matrix for paths of length up to . Notice that, for , is the citation flow matrix for paths of length equal to .
Consider again the simple citation network depicted in Figure 2, where nodes are partitioned in 3 disjoint disciplines. The light blue and green communities are closed worlds (autarchies), since they reference only within their own groups (their offdiagonal flows in matrix is indeed 0). On the other hand, the red community is more interdisciplinary, since it references the other two groups outside its territory (the offdiagonal flow in matrix is 2.25).
3 Case study
We applied our method on all publications from the CWTS inhouse version of the Web of Science, considering the years between 2000 and 2016 included. We consider a total of 17,932,523 publications, and 190,550,206 citations among them – excluding 444,436 synchronous citations, which we discarded to guarantee that G is a DAG.^{2}^{2}2A citation between two publications is discarded if the publication time (year and month) of the citing publication is the same, or older than the publication time of the cited publication. The longest citation path in the dataset is of length 29 – equal to the maximum number of iterations needed for convergence. In what follows, we rely on the highlevel aggregation of the journalbased classification of Web of Science, which represents 30 broad disciplines (see Table 2).
3.1 The contribution of higherorder citations
We start by assessing the contribution of firstorder and higherorder citations to the citation flow among disciplines. Recall that partial flow matrix is the flow matrix for paths of length up to , with total flow matrix , where is the length of the longest path in the citation graph. Let be the flow matrix for paths of length precisely . The entrywise matrix norm defined as is a measure of the total citation flow contained in matrix . We also tested the Frobenius norm with similar outcomes.
We computed the norm of partial flow matrices relative to the norm of total flow matrix , for . Results are shown in Figure 3. Firstorder (direct) citations contribute for 58% to the overall flow, hence higherorder citations contribute for 42%, a significant share. In particular, the share of secondorder (length 2) citations is 20%, that of thirdorder citations (length 3) is 12%, and that of fourthorder citations (length 4) is 6%. Longer citations paths account for about 4% of the flow. When we consider the top disciplines by flow contribution (Figure 4), we have that six of them account for 38% (over 42%) of firstorder flow, 13% (over 20%) of secondorder flow, 8% (over 12%) of thirdorder flow, and 4% (over 4%) of fourthorder flow, following a similar pattern to global contributions.^{3}^{3}3In order: Clinical medicine, Physics and materials science, Chemistry and chemical engineering, Basic life sciences, Biomedical sciences, Biological sciences. We conclude that there is an important part of dependence flow that goes beyond direct citations which is worth investigating.
3.2 The citation flow network
The citation flow matrix is a full matrix and hence the corresponding flow network is a full graph. However, one might investigate the pairs of disciplines that have an higher than expected citation flow, and those that have a lower than expected citation flow.
Table 2 contains, for each discipline, the internal citation flow (selfflow), the outgoing and incoming citation flows and, moreover, the size of the discipline in number of articles. As expected, citation flows are strongly correlated with size of the discipline (Pearson correlation above 0.9).
To overcome the sizedependence issue, we normalize the flow matrix using the signed contribution to Pearson’s squared test. The normalized flow between disciplines and is computed as:
where
is the expected flow between and . The pairs of disciplines that significantly cite each other more than expected (above the 90th percentile) and less than expected (below the 10th percentile) are shown in Figure 5. As for withindiscipline citation flows (normalized by expected citations), Astronomy and Astrophysics, Mathematics, and Language and Linguistics lead the ranking, while Instruments and Instrumentation, Basic Medical Sciences and General and Industrial Engineering are at the bottom.
Furthermore, we consider the same network limited to positively weighted edges, thus with a higher than expected citation flow. We then apply the fast greedy clustering method to this network, as depicted in Figure 6. Four macro areas emerge from this analysis, namely the life and medical sciences, science and engineering applied to the Earth and the environment, mathematical sciences and social and human sciences. If we do the same limiting ourselves to firstorder citations (Figure 7), the partition of disciplines into communities is less clear.
Our analyses suggest that some disciplines are more interdisciplinary (connecting different communities) and other more autarchic (mostly selfreferencing), a topic we explore in the following section.
3.3 Interdisciplinarity and autarchy
In this section we match higherorder citation flows with measures of interdisciplinarity. We claim that:
A discipline is interdisciplinary when it is evenly cited from dissimilar disciplines.
This thesis immediately recalls the Rao quadratic entropy [17], which has been previously used to measure interdisciplinarity [15, 16, 26, 25]. The Rao quadratic entropy is one measure among others which have been studied in the literature [13]
. Let us consider a set of objects and a probability distribution
such that is the probability of object . Suppose we also have information about pairwise distance (dissimilarity) among any two objects and . Then a measure of heterogeneity among objects is the Rao quadratic entropy:There are two components in this definition of heterogeneity: (1) the evenness of the distribution , (2) the distances among objects. It holds that, in general:

is large when evenly distributes its probability among dissimilar objects;

on the contrary, is small when concentrates its probability on similar objects.
To apply Rao’s measure to the higherorder citation flow matrix , we proceed as follows. For each discipline pair and , let
Notice that is the relative share of citation flow from discipline to discipline compared to the total flow received by . Notice, moreover, that is a probability distribution.
The similarity among two disciplines and is computed as the cosine of the angle between the and columns and of the flow matrix :
The cosine runs from 0 (no similarity) to 1 (maximum similarity). Hence, two disciplines are similar if they have a similar pattern of incoming citation flows. The distance among two disciplines and is then
so that two disciplines are distant if they are not similar.
Discipline  Rao 

Statistical Sciences  0.678 
Management And Planning  0.645 
General And Industrial Engineering  0.641 
Social And Behavioral Sciences, Interdisciplinary  0.622 
Civil Engineering And Construction  0.601 
…  … 
Chemistry And Chemical Engineering  0.360 
Mathematics  0.341 
Astronomy And Astrophysics  0.316 
Physics And Materials Science  0.302 
Clinical Medicine  0.294 
Finally, for each discipline , we apply the Rao quadratic entropy to the flow distribution and distance measure among disciplines. This gives us a measure of interdisciplinarity for each discipline. The top and bottom 5 interdisciplinary disciplines are given in Table 1.
Notice how two interrelated disciplines like Statistical Sciences and Mathematics end up on quite different ranks: while Statistics is interdisciplinary, Mathematics is rather autarchic. Indeed, Mathematics receives 78% of higherorder citation flow from itself, and the rest from a small number of other fields, mainly Physics, Materials Science and Computer Science. On the other hand, the internal flow for Statistics is limited to 43%. Statistics receives instead a significant citation flow from many other disciplines, including Mathematics, Computer Sciences, Economics and Business, General and Industrial Engineering, Electrical Engineering and Telecommunication, Clinical Medicine. This suggests that higherorder citations should be considered when assessing the degree of interdisciplinarity or autarchy of a discipline.
4 Conclusion
A considerable amount of effort goes into quantifying and assessing citation influence and impact via direct citations. We proposed instead here to quantify citation influence beyond direct citations by also using higherorder citations, that is citations chains of arbitrary length among pairs of publications. We have presented a method, informed by PageRank, to quantify the higherorder citation influence of publications. The proposed method accounts for both direct, or firstorder, and indirect, or higherorder citations. In particular, we assessed the method on the whole Web of Science corpus between 2000 and 2016 at the level of entire disciplines.
Our results show that the contribution of firstorder (length 1) citations accounts for 58% of the whole higherorder citation flow, while higherorder citations (levels 2 and above) account for 42%: a significant share. The proposed method is sizedependent, yet easily normalized, and it can be used for a variety of applications. We investigated two here. By using higherorder citation flows, we were able to provide for a highlevel map of science clearly distinguishing among four macroareas: life and medical sciences, Earth and environment sciences, mathematical sciences, social and human sciences. The same picture using only firstorder information was found to be less clearcut. Furthermore, we used the proposed method to rate disciplines according to their degree of interdisciplinarity using the Rao quadratic entropy. We are thus able to distinguish between autarchic disciplines, e.g., mathematics, and interdisciplinary ones, e.g. statistics. We suggest that accounting for higherorder citations is thus relevant and important, and might help on a variety of open scientimetrics questions: performing clustering, measuring interdisciplinarity, assessing the impact of fundamental research, among others.
Acknowledgements
This work stems from prior efforts in collaboration with Ludo Waltman and Vincent A. Traag [3], whom we thank for their contribution. We are grateful to the Centre for Science and Technology Studies (CWTS), Leiden University, for providing us access to their databases.
References
 [1] (1998) The anatomy of a largescale hypertextual Web search engine. Computer Networks and ISDN Systems 30 (17), pp. 107–117 (en). External Links: ISSN 01697552, Link, Document Cited by: §1.
 [2] (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology 57 (3), pp. 359–377 (en). External Links: ISSN 15322882, 15322890, Link, Document Cited by: §1.
 [3] (2019) Quantifying the longterm influence of scientific publications. In Proceedings of the 17th International Conference on Scintometrics & Informetrics, Cited by: Acknowledgements.
 [4] (2011) PageRank: Standing on the shoulders of giants. Communications of the ACM 54 (6), pp. 92–101. External Links: Link Cited by: §1.
 [5] (2003) Why do we need algorithmic historiography?. Journal of the American Society for Information Science and Technology 54 (5), pp. 400–412. External Links: Link Cited by: §1.
 [6] (1964) The use of citation data in writing the history of science. The Institute for Scientific Information, Technical Report AF 49(638)1256. Cited by: §1.
 [7] (2018) Measuring discursive influence across scholarship. Proceedings of the National Academy of Sciences, pp. 201719792 (en). External Links: ISSN 00278424, 10916490, Link, Document Cited by: §1.
 [8] (2018) Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics 6, pp. 391–406. External Links: Link, Document Cited by: §1.
 [9] (2009) Toward a consensus map of science. Journal of the American Society for Information Science and Technology 60 (3), pp. 455–476 (en). External Links: ISSN 15322882, 15322890, Link, Document Cited by: §1.
 [10] (2008) Mainpath analysis and pathdependent transitions in HistCite™based historiograms. Journal of the American Society for Information Science and Technology 59 (12), pp. 1948–1962 (en). External Links: ISSN 15322882, 15322890, Link, Document Cited by: §1.
 [11] (2014) Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS): Detecting the Historical Roots of Research Fields by Reference Publication Year Spectroscopy (RPYS). Journal of the Association for Information Science and Technology 65 (4), pp. 751–764 (en). External Links: ISSN 23301635, Link, Document Cited by: §1.
 [12] (1957) Priorities in Scientific Discovery: A Chapter in the Sociology of Science. American Sociological Review 22 (6), pp. 635–659 (en). External Links: ISSN 00031224, Link, Document Cited by: §1.
 [13] (201605) Bibliometric indicators of interdisciplinarity: the potential of the Leinster–Cobbold diversity indices to study disciplinary diversity. Scientometrics 107 (2), pp. 593–607. External Links: ISSN 01389130, 15882861, Link, Document Cited by: §3.3.
 [14] (2018) Networks: an introduction. 2nd edition, Oxford University Press. Cited by: §2.1.
 [15] (2009) Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics 81 (3), pp. 719–745 (en). External Links: ISSN 01389130, 15882861, Link, Document Cited by: §3.3.
 [16] (2010) Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82 (2), pp. 263–287 (en). External Links: ISSN 01389130, 15882861, Link, Document Cited by: §3.3.
 [17] (1982) Diversity and dissimilarity coefficients: a unified approach. Theoretical Population Biology 21, pp. 24–43. Cited by: §3.3.
 [18] (2019) Follow the Leader: Documents on the Leading Edge of Semantic Change Get More Citations. arXiv:1909.04189 [physics] (en). Note: arXiv: 1909.04189 External Links: Link Cited by: §1.
 [19] (202001) Intermediacy of publications. Royal Society Open Science 7 (1), pp. 190207 (en). External Links: ISSN 20545703, 20545703, Link, Document Cited by: §1.
 [20] (2016) Introducing CitedReferencesExplorer (CRExplorer): A program for reference publication year spectroscopy with cited references standardization. Journal of Informetrics 10 (2), pp. 503–515 (en). External Links: ISSN 17511577, Link, Document Cited by: §1.
 [21] (2016) Constructing conceptual trajectory maps to trace the development of research fields. Journal of the Association for Information Science and Technology 67 (8), pp. 2016–2031 (en). External Links: ISSN 23301635, Link, Document Cited by: §1.
 [22] (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84 (2), pp. 523–538 (en). External Links: ISSN 01389130, 15882861, Link, Document Cited by: §1.
 [23] (2014) CitNetExplorer: A new software tool for analyzing and visualizing citation networks. Journal of Informetrics 8 (4), pp. 802–823 (en). External Links: ISSN 17511577, Link, Document Cited by: §1.
 [24] (2014) PageRankrelated methods for analyzing citation networks. In Measuring scholarly impact, pp. 83–100. External Links: Link Cited by: §1.
 [25] (2019) Consistency and validity of interdisciplinarity measures. Quantitative Science Studies, pp. 1–28 (en). External Links: ISSN 26413337, Link, Document Cited by: §3.3.
 [26] (2015) Does Interdisciplinary Research Lead to Higher Citation Impact? The Different Effect of Proximal and Distal Interdisciplinarity. PLOS ONE 10 (8), pp. e0135095 (en). External Links: ISSN 19326203, Link, Document Cited by: §3.3.
Appendix
id  discipline  size  self flow  incoming flow  outgoing flow 

1  agriculture and food science  875440.50  780500.12  529167.52  743893.92 
2  astronomy and astrophysics  381254.75  686101.56  219418.90  171588.10 
3  basic life sciences  2579591.25  3456087.00  3474212.42  2007738.04 
4  basic medical sciences  268307.25  199883.83  335008.55  483618.99 
5  biological sciences  1402123.00  1259296.75  910008.50  1164499.91 
6  biomedical sciences  2507916.50  2356196.00  2470855.73  2487821.15 
7  chemistry and chemical engineering  3510294.25  4352712.50  1959569.08  2466840.06 
8  civil engineering and construction  160902.86  127872.16  132699.23  155468.25 
9  clinical medicine  6024741.50  8482322.00  3270959.40  3051526.10 
10  computer sciences  647474.88  668669.81  482215.10  506644.46 
11  earth sciences and technology  934568.50  1395625.38  549727.39  443447.22 
12  economics and business  429852.88  526190.56  277452.46  185736.94 
13  educational sciences  238509.97  212864.89  116494.86  163714.45 
14  electrical engineering and telecommunication  842418.88  902059.25  629718.60  612375.60 
15  energy science and technology  343416.62  196160.98  263133.26  337039.66 
16  environmental sciences and technology  983358.88  1125205.62  886273.78  1027153.47 
17  general and industrial engineering  198930.95  101423.06  163249.88  222303.95 
18  health sciences  496159.94  479285.53  429532.91  612249.92 
19  information and communication sciences  104181.30  79418.53  56385.11  76125.22 
20  instruments and instrumentation  154830.81  59613.47  153544.22  185356.41 
21  language and linguistics  98703.09  80662.09  24108.05  42272.65 
22  management and planning  156367.38  115467.25  143213.02  145707.05 
23  mathematics  831350.88  1003179.06  281315.60  334351.78 
24  mechanical engineering and aerospace  595979.12  489624.09  386884.15  441418.25 
25  physics and materials science  4089318.25  6163358.50  2098397.77  2250967.82 
26  political science and public administration  193848.67  170155.39  83619.01  76208.32 
27  psychology  581770.75  617750.44  458871.44  412155.69 
28  social and behavioral sciences, interdisciplinary  132240.47  74401.98  108292.33  128959.83 
29  sociology and anthropology  218277.44  172026.30  148015.28  173080.76 
30  statistical sciences  222210.95  194457.47  252692.38  184771.96 