# Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications

The von Neumann graph entropy (VNGE) facilitates the measure of information divergence and distance between graphs in a graph sequence and has successfully been applied to various network learning tasks. Albeit its effectiveness, it is computationally demanding by requiring the full eigenspectrum of the graph Laplacian matrix. In this paper, we propose a Fast Incremental von Neumann Graph EntRopy (FINGER) framework, which approaches VNGE with a performance guarantee. FINGER reduces the cubic complexity of VNGE to linear complexity in the number of nodes and edges, and thus enables online computation based on incremental graph changes. We also show asymptotic consistency of FINGER to the exact VNGE, and derive its approximation error bounds. Based on FINGER, we propose ultra-efficient algorithms for computing Jensen-Shannon distance between graphs. Our experimental results on different random graph models demonstrate the computational efficiency and the asymptotic consistency of FINGER. In addition, we also apply FINGER to two real-world applications and one synthesized dataset, and corroborate its superior performance over seven baseline graph similarity methods.

## Authors

• 72 publications
• 42 publications
• 48 publications
• 3 publications
• ### Fast Computing von Neumann Entropy for Large-scale Graphs via Quadratic Approximations

The von Neumann graph entropy (VNGE) can be used as a measure of graph c...
11/12/2018 ∙ by Hayoung Choi, et al. ∙ 0

• ### Graph Edit Distance Computation via Graph Neural Networks

Graph similarity search is among the most important graph-based applicat...
08/16/2018 ∙ by Yunsheng Bai, et al. ∙ 0

• ### An Efficient Probabilistic Approach for Graph Similarity Search

Graph similarity search is a common and fundamental operation in graph d...
06/17/2017 ∙ by Zijian Li, et al. ∙ 0

• ### Using Laplacian Spectrum as Graph Feature Representation

Graphs possess exotic features like variable size and absence of natural...
12/02/2019 ∙ by Edouard Pineau, et al. ∙ 0

• ### Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

Graph comparison is a fundamental operation in data mining and informati...
03/03/2020 ∙ by Anton Tsitsulin, et al. ∙ 0

• ### Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

In this report, we experimented with several concepts regarding text str...
11/29/2018 ∙ by Rui Portocarrero Sarmento, et al. ∙ 0

• ### Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis

The ubiquity of massive graph data sets in numerous applications require...
06/24/2020 ∙ by Eugenio Angriman, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In recent years, graph-based learning has become an active research field Shuman13 ; kalofolias2016learn ; luo2012forging ; shivanna2014learning ; wang2016graph ; kipf2016variational

. Its success is rooted in the advanced capability of summarizing and representing phenomenal structural features embedded in graphs. In particular, evaluating similarity between graphs is crucial to network analysis and graph-based anomaly detection

papadimitriou2010web ; Akoglu15Graphanomalydetection ; ranshous2015anomaly . For example, Yanardag and Vishwanathan used graph similarity for learning novel graph kernels yanardag2015structural , and Sharpnack et al. proposed the Lovasz extended scan statistic for anomaly detection in connected graphs sharpnack2013near . Koutra et al. proposed DeltaCon, a state-of-the-art similarity algorithm in terms of its scalability and capability of handling weighted graphs using fast belief propagation koutra2016d

. However, these methods are sensitive to heuristic metrics and presumed models, and thus provide limited understanding on the general notion of variations between graphs. On the other hand, model-agnostic approaches such as graph entropy have been used to quantify the structural complexity of a single graph, which relates to the Shannon entropy of a probability distribution over a function of enumerated subgraphs in a graph

simonyi1995graph ; shetty2005discovering ; li2016structural . However, graph entropy can be computationally demanding due to its use of exhaustive subgraph search.

Different from the aforementioned approaches and inspired by quantum information theory, the von Neumann graph entropy (VNGE) braunstein2006laplacian ; passerini2008neumann ; passerini2009quantifying facilitates the measure of (quantum) Jensen-Shannon divergence and distance endres2003new ; briet2009properties between graphs. It associates with a model-agnostic information measure for quantifying variation between two quantum density matrices. In addition, the VNGE has been shown to be linearly correlated with classical graph entropy measures anand2009entropy ; anand2011shannon . The VNGE and the Jensen-Shannon distance have been successfully applied to structural reduction in multiplex networks de2015structural , depth analysis in image processing han2012graph ; bai2014depth , structure-function analysis in genetic networks seaman2017nucleome , and network-ensemble comparison li2017network

. However, despite its effectiveness, the computation of VNGE requires (at most) cubic complexity in the number of nodes, thereby impeding its applicability to machine learning and data mining tasks involving a sequence of large graphs.

Related Work. The VNGE was firstly defined based on the combinatorial graph Laplacian matrix braunstein2006laplacian ; passerini2008neumann ; passerini2009quantifying ; de2015structural ; li2017network . Variants of VNGE and their approximations have been proposed in the literature, including the normalized graph Laplacian matrix Shi00 proposed in han2012graph and the generalized graph Laplacian matrix of directed graphs chung2005laplacians proposed in ye2014approximate . However, these alternatives lack approximation justification and are shown to be suboptimal in Section 4. To the best of our knowledge, this paper is the first work that provides fast VNGE computation with approximation analysis.

Contributions. To overcome the computational inefficiency of VNGE, we propose a Fast Incremental von Neumann Graph EntRopy (FINGER) framework to approximate VNGE with a performance guarantee, reducing its cubic complexity to linear complexity in the number of nodes and edges. FINGER is a generic tool that applies to both batch and online graph sequences. It enables fast entropy computation when every single graph in a graph sequence is presented (e.g., a snapshot of a dynamic network, or a single-layer connectivity pattern of a multiplex network). For applications where changes in a graph (e.g., addition and deletion of nodes or edges over time) are continuously reported (e.g., streaming graphs), FINGER also allows online computation based on incremental graph changes. We prove that FINGER maintains an approximation guarantee and is asymptotically consistent to the exact VNGE, which is further validated by different synthetic random graphs. We then apply FINGER to developing ultra-efficient algorithms for the computation of Jensen-Shannon distance between graphs. Comparing to the state-of-the-art graph similarity methods and two alternative approximate VNGE, FINGER yields superior and robust performance for anomaly detection in evolving Wikipedia networks and router communication networks, as well as bifurcation analysis in dynamic genomic networks. These applications show the effectiveness and potentials of Jensen-Shannon distance for network learning in a wide range of domains, which has not been rigorously explored owing to its high computation complexity in the absence of FINGER.

The contributions of this paper and the proposed framework (FINGER) are summarized as follows.
Two types of approximate VNGE reducing its cubic complexity to linear complexity are proposed to support fast and incremental computation of VNGE. We derive their approximation error bounds and show asymptotic consistency relative to the exact VNGE under mild conditions.
FINGER achieves nearly 100% reduction in computation time for VNGE of different random graph models and enables scalable Jensen-Shannon graph distance computation.
On two real-world applications (anomaly detection and cellular bifurcation analysis) and one synthesized dataset, FINGER exhibits outstanding and robust performance over 7 baseline methods.

## 2 FINGER: Theory and Algorithms

### 2.1 Background and Preliminaries

Using terminology from quantum statistical mechanics, a density matrix describing a quantum system in a mixed state can be cast as a statistical ensemble of several quantum states. The matrix is symmetric, positive semidefinite, and satisfies . The von Neumann entropy of a quantum system is defined as von1955mathematical , where denotes matrix logarithm. Let

be the sorted eigenvalues of

such that . The definition of von Neumann entropy is equivalent to , where the convention is used due to . Moreover, since and for all , the von Neumann entropy can be viewed as the Shannon entropy associated with the eigenspectrum .

We consider the class of undirected weighted simple non-empty graphs with nonnegative edge weights, denoted by . Let denote a single graph, where and denote its node and edge set with cardinality and , respectively, and is an matrix with entry denoting the weight of an edge . A graph sequence refers to a set of graphs indexed by with known node-to-node correspondence, where for all . The combinatorial graph Laplacian matrix of is defined as Luxburg07 , where is a diagonal matrix and its diagonal entry is the nodal strength (weighted degree) of a node . Connecting the von Neumann entropy to graphs, the VNGE, denoted by , is defined by replacing with braunstein2006laplacian ; passerini2008neumann ; passerini2009quantifying , where is a trace normalization factor. It has been proved in passerini2008neumann that for any , , where the equality holds when is a complete graph. Note that since computing VNGE requires the entire eigenspectrum of , it incurs full eigenvalue decomposition on and has cubic complexity 111 means and means .222For computing all eigenvalues of large matrices, a viable solution is direct methods, possibly with parallel eigensolvers for acceleration. The complexity for computing of is bai2000templates . HornMatrixAnalysis , making it computationally infeasible for large graphs.

In what follows, we propose two types of approximate VNGE ( and ) for the exact VNGE , where and possess linear computation complexity and satisfy . Depending on the data format and problem setup, is designed for fast computation of for a single graph, and is designed for online computation of based on incremental graph changes. Furthermore, we derive approximation error and prove asymptotic consistency relative to under mild conditions on the eigenspectrum of . Our proofs are presented in the supplementary material.

### 2.2 Approximation Analysis of von Neumann Graph Entropy

Recall that computing requires computation complexity. To accelerate its computation, we first reduce its computation complexity by using the quadratic approximation of the term in via Taylor series expansion, leading to the following lemma.

###### Lemma 1 (Quadratic approximation Q of H).

For any , the quadratic approximation of the von Neumann graph entropy via Taylor series expansion is equivalent to , where and .

It is clear from Lemma 1 that only depends on the edge weights in , resulting in linear computation complexity 333The complexity becomes when (i.e., dense graphs). In sparse graphs could be ., where and . We note that higher-order (beyond quadratic) approximation of is plausible at the price of less computational efficiency and possibly excessive subgraph pattern searching. For example, the cubic approximation of involves the computation of , which relates to the sum of edge weights of every triangle in . To identify the approximation accuracy and consistency of with respect to , the following theorem shows the approximation bounds on in terms of and the eigenspectrum of .

###### Theorem 1 (Approximation bounds on H).

For any , let and be the largest and smallest positive eigenvalue of , respectively. If , then . The bounds become exact and when is a complete graph with identical edge weight.

Note that Theorem 1 excludes the extreme case when , as the resulting VNGE is trivial (). The condition holds for any graph having a connected subgraph with at least 3 nodes. In addition to the approximation bounds presented in Theorem 1, the corollary below further shows asymptotic consistency between and under mild conditions on and .

###### Corollary 1 (Asymptotic consistency of Q).

For any , let denote the number of positive eigenvalues of . If and , then as .

Corollary 1 suggests that the VNGE of large graphs with balanced eigenspectrum (i.e., )444 means . can be well approximated by and a factor . The condition of balanced eigenspectrum holds in regular and homogeneous random graphs passerini2008neumann ; du2010note . Furthermore, since equals to , where is the number of connected components in Merris94 , the condition holds when 11footnotemark: 1.

### 2.3 Finger-ˆH: Approximate von Neumann Graph Entropy ˆH Using Q and λmax

Based on the derived lower bound of as stated in Theorem 1, we propose the first type of approximate VNGE using and for any , which is defined as

 ˆH(G)=−Qlnλmax. (1)

Comparing to the lower bound in Theorem 1, is a looser lower bound on since . Here we use when approximating , since and hence is negligible, especially for large graphs.

More importantly, since is the largest eigenvalue of and by definition has nonzero entries, the computation of only requires operations via power iteration methods HornMatrixAnalysis ; wu2016primme_svds , leading to the same complexity as . Consequently, by only acquiring instead of the entire eigenspectrum , the computation of has linear complexity , resulting in significant computation reduction when compared with the exact VNGE , which requires cubic complexity 22footnotemark: 2. In addition to computational efficiency, the following corollary shows that the approximation error of , defined as , decays at the rate of under the same conditions as in Corollary 1. We note that the approximation error rate is nontrivial since for any passerini2008neumann ; du2010note .

###### Corollary 2 (o(lnn) approximation error of ˆH).

For any , if and , then the scaled approximation error (SAE) as , implying .

### 2.4 Finger-˜H: Approximate von Neumann Graph Entropy ˜H Using Q and smax

The proxy in Section 2.3 enables fast computation of VNGE for a single graph. As the exact online update of the eigenvalue in based on incremental graph changes is challenging, we propose the second type of approximate VNGE using and the largest nodal strength in a graph, which allows simple incremental computation of based on graph changes but at the price of larger approximation error than that of . The approximate VNGE is defined as

 ˜H(G)=−Qln(2c⋅smax), (2)

where is the trace normalization constant. Using the definition and the upper bound on the largest eigenvalue of in anderson1985eigenvalues , we obtain since , implying is a looser lower bound on when compared with . Nonetheless, the following corollary shows the approximation error of also decays at the same rate as .

###### Corollary 3 (o(lnn) approximation error of ˜H).

For any , if and , then the scaled approximation error (SAE) as , implying .

To enable incremental computation of VNGE using , let and be any two graphs from a graph sequence. Without loss of generality we assume and have a common node set with nodes555If and have different nodes, the set can be constructed by the set union .. In particular, the graph with and is introduced to represent the changes made from converting to , denoted by 666The notation denotes set additions , and matrix addition .. The terms and denote the nodal strengths and edge weights of , respectively, and . Let be the quadratic approximation of . The theorem below shows that can be efficiently updated based on of , the values of and from , and , yielding competent complexity .

###### Theorem 2 (Incremental update of Q′).

For any such that , given , and , the term can be updated by , where , and .

Furthermore, by the definition of in (2), can be efficiently updated by

 ˜H(G⊕ΔG)=−Q′ln[2(c+Δc)(smax+Δsmax)] (3)

given , and from , and graph changes , where is defined in Theorem 2, and is the maximum value of and . The computation complexity of is since the incremental update formula of in Theorem 2 and the computation of only require operations.

### 2.5 Fast and Incremental Algorithms for Jensen-Shannon Distance between Graphs

One major utility of VNGE is the computation of Jensen-Shannon distance (JSdist) between any two graphs from a graph sequence. Consider two graphs and , and let denote their averaged graph such that . Then the Jensen-Shannon divergence between and can be computed by de2015structural . Furthermore, the Jensen-Shannon distance between and is defined as , which has been proved to be a valid distance metric in endres2003new ; briet2009properties . The exact computation of JSdist requires computation complexity by the definition of , where , which is computationally cumbersome for large graphs. To overcome its computational inefficiency, we apply the developed FINGER- and FINGER- to the computation of JSdist, as summarized in Algorithms 1 and 2.

Algorithm 1 FINGER-JSdist (Fast) Input: Graphs and from graph sequence Output: JSdist() 1. Obtain and compute , , and via FINGER- from (1) 2. Algorithm 2 FINGER-JSdist (Incremental) Input: Graph , graph changes , and Output: JSdist() 1. Compute and via FINGER- from (3) and Theorem 2 2.

If each graph in a graph sequence is given, then FINGER-JSdist (Fast) allows fast computation of JSdist and features linear computation complexity inherited from . If a graph sequence is presented by sequential graph changes such that , then FINGER-JSdist (Incremental) allows online computation of JSdist relative to the incremental graph changes. As will be demonstrated in Section 4, these two algorithms yield outstanding and robust performance in two real-world applications in terms of both effectiveness and efficiency.

## 3 Experiments

In this section we conducted intensive experiments on the VNGE of three kinds of synthetic random graphs to study the effects of graph size, average degree, and graph regularity on the approximation error of FINGER and its computational efficiency. The three random graph models are: (i) Erdos-Renyi (ER) model erdos1959random - every node pair is connected independently with probability ; (ii) Barabasi-Albert (BA) model Barabasi99 - the degree distribution follows a power-law distribution; and (iii) Watts-Strogatz (WS) model Watts98_simple - an initially regular ring network with independent edge rewiring probability for simulating small-world networks. The parameter controls the regularity of graph connectivity, and smaller simulates more regular graphs. Since , the approximation error (AE) is defined as and , respectively. The scaled approximation error (SAE) is defined as . The computation time reduction ratio (CTRR) is defined as , where and denotes the computation time for . All experiments (including Section 4) were conducted by Matlab R2016 on a 16-core machine with 128 GB RAM. The results in this section are averaged over 10 random trials. We also report more results in the supplementary material due to space limitation.

The effect of average degree and graph regularity parameter . Figures 1 (a) and 1 (b) display the exact and the two approximate VNGE of ER and BA graphs and the corresponding CTRR under varying . When fixing the number of nodes , both and better match as increases, suggesting their AE decays with . Comparing their CTRR, the computation of and enjoys at least speed-up relative to . The drastic reduction in computation time can be explained by the efficient linear complexity of FINGER, as opposed to the high complexity in computing the entire eigenspectrum for calculating . The CTRR of slightly decays with due to the growing number of nonzero entries (edges) in , resulting in increasing operations for computing . Although the AE of is always smaller than that of due to the fact that , the CTRR of has nearly 100% speed-up relative to by simply requiring the information of instead of from a graph. Figure 1 (c) displays the AE and CTRR of and under varying edge rewiring probability and different average degree of WS model. Similar to ER and BA graphs, when fixing and , the AE of and decays as increases. When and are fixed, smaller yields less AE for both and , suggesting that FINGER attains better approximation when graphs are more regular. Since the curves of CTRR for different in WS model have similar behavior, here we only report the results when . Consistent with the observations in ER and BA graphs, in WS graphs the CTRR of and achieves nearly 100% improvement relative to , and attains slightly better CTRR than at the price of larger AE.

The effect of graph size . Figure 2 displays the SAE of FINGER under the three random graph models when varying the number of nodes . Since the results of and are similar, we show the SAE of in Figure 2 and report the SAE of in the supplementary material. By the fact that ER and WS graphs have balanced eigenspectrum Mieghem10 , for ER and WS models the SAE of both and decays as increases, which verifies the approximation error as stated in Corollaries 2 and 3. On the other hand, the SAE of BA graphs is observed to grow logarithmically in due to the existence of extreme eigenvalues (imbalanced eigenspectrum) Mieghem10 ; goh2001spectra . Similar to the observations from fixed-size graphs, for a fixed the SAE decays with and graph regularity in all cases. In addition, the CTRR attains nearly 100% speed-up relative to for moderate-size graphs ().

## 4 Applications

Here we apply FINGER to the computation of Jensen-Shannon (JS) distance between graphs (Section 2.5) in two applications and demonstrate its outstanding performance over seven baseline methods.

Anomaly detection in evolving Wikipedia hyperlink networks. Wikipedia is an online encyclopedia that allows editing and referencing between articles. By viewing an article as a node and a hyperlink as an edge, the evolution of Wikipedia forms a graph sequence over time. Table 1 summarizes four evolving Wikipedia networks of different language settings collected in mislove2009online ; preusse2013structural , where each graph corresponds to a monthly snapshot of a hyperlink network. These datasets are presented in terms of addition and deletion of nodes or edges with timestamps (i.e., continuous graph changes ), which directly applies to incremental JS distance computation via FINGER (Algorithm 2). Fast JS distance computation via FINGER (Algorithm 1) can also be applied by computing to obtain . The task of anomaly detection is to identify noticeable changes in the consecutive monthly snapshots of Wikipedia hyperlink networks.

Bifurcation detection in dynamic genomic networks. The genome-wide chromosome conformation capture (Hi-C) contact maps beloqui2009reactome for studying cell reprogramming from human fibroblasts to skeletal muscle can be viewed as a graph sequence consisting of 12 sampled spatial measurements, in which the cell reprogramming undergoes a space-time bifurcation at the 6th measurement as verified in sijiaCell . The task is to identify this bifurcation instance based on the dynamic Hi-C contact maps.

Evaluation. The major differences are (i) unweighted v.s. weighted and (ii) w/ v.s. w/o ground truth.
In the Wikipedia case (unweighted graphs), there is no other information for verifying the detected anomalies. Therefore, we use the vertex/edge overlapping (VEO) dissimilarity score papadimitriou2010web as the approximate ground truth. VEO is an intuitive and properly normalized metric of anomaly for unweighted graphs, defined as , which is between and relates to the Sorensen–Dice coefficient dice1945measures ; sorensen1948method for comparing the similarity of two samples. Here a high VEO score directly pinpoints the month when articles are edited by a relatively significant amount.
In the genome case (weighted graphs), the ground-truth bifurcation instance was verified. Moreover, unlike the Wikipedia case, the genome dataset contains nonnegative edge weights. Therefore, in this case VEO is not an appropriate metric because by definition it is insensitive to edge weight changes.

Comparative methods. We compare the proposed method with the following baseline methods:

DeltaCon koutra2016d : DeltaCon uses the idea of fast belief propagation to compute graph similarity and outputs a similarity score . We use as the anomaly score.
RMD koutra2016d : RMD is the Matusita distance deduced from DeltaCon, which is defined as .
distance bunke2007graph ; wilson2008study : The Euclidean distance between two sets of top eigenvalues of a matrix. Here we use the weight matrix (Adj.) and the graph Laplacian matrix (Lap.), and set .
GED bunke2007graph : graph edit distance (GED) for undirected unweighted graphs is the number of operations (node/edge additions and removals) required to convert a graph to another graph .
VNGE-NL han2012graph / VNGE-GL ye2014approximate : Two VNGE heuristics using the normalized/generalized graph Laplacian matrix. Unlike FINGER, they lack an certified approximation error analysis.

Wikipedia results. We compute the dissimilarity metrics of each method and compare them with the approximate ground truth in terms of the Pearson correlation coefficient (PCC). A higher PCC suggests a better match with the ground truth for detecting anomaly in monthly edit changes. The PCC and computation time of each method are reported in Table 2. For illustration, the dissimilarity metrics of Wikipedia-EN are shown in Figure 3 (a). The anomaly statistics of ground truth and FINGER meet the intuition that in the earlier stage the monthly evolution of Wikipedia is more drastic, and in the later stage it becomes stable since the changes are subtle relative to the entire network. In Table 2, FINGER-JSdist (Fast) attains the best PCC (0.9029) and competitive computation time. This suggests that the computation of JS distance can be made efficient by FINGER, and in this task it indeed learns a similar notion of anomaly indicated by the ground truth. For example, in Figure 3 (a) their top 10 flagged anomalies have 9 months in common. On the other hand, the other anomaly metrics are implicitly defined, unnormalized or lacking approximation guarantees, making the detected anomalies less explainable. FINGER-JSdist (Incremental) has the least computation time by leveraging online computation, and it achieves the second best PCC due to looser approximation error of than . Nonetheless, FINGER-JSdist (Incremental) is roughly 3 times faster than GED, 20 times faster than VNGE-GL, 50 times faster than FINGER-JSdist (Fast), 100 times faster than DeltaCon, RMD and VNGE-NL, and 200-300 times faster than distance. In addition to PCC, we also report the rank correlation coefficients in the supplementary material to further validate the consistency between FINGER and the approximate ground truth.

Bifurcation detection results. Based on the ground-truth statistic provided by sijiaCell , we compare the performance of detecting the critical bifurcation point by each method. Let denote a dissimilarity metric between two graphs and from . For each method, the temporal difference score (TDS) proposed in sijiaCell is used for bifurcation detection, which is defined as when , and and . The measurement(s) corresponding to a local minimum in TDS is detected as a bifurcation instance. The ground-truth statistic and TDS of each method are shown in Figure 3 (b). Among all the compared methods, FINGER-JSdist (Algorithm 1) is the only method that correctly detects the bifurcation point (index 6), and its TDS based on JS distance also resembles the shape of ground-truth statistic.

In the supplement, we also report synthesized anomaly detection results using another communication network dataset to corroborate the stability and effectiveness of the proposed FINGER method.

## 5 Conclusion

In this paper, we proposed FINGER, a novel framework for efficiently computing von Neumann graph entropy (VNGE). FINGER reduces the computation of VNGE from cubic complexity to linear complexity for a given graph, and allows online computation based on incremental graph changes. In addition to bounded approximation error, our theory shows that FINGER is guaranteed to have asymptotic consistency to the exact VNGE under mild conditions, which has been validated by extensive experiments on three different random graph models. The high efficiency of FINGER also leads to scalable network learning algorithms for computing Jensen-Shannon distance between graphs. Furthermore, we use two domain-specific applications to corroborate the efficiency and effectiveness of FINGER when compared to 7 baseline graph similarity methods. The results demonstrate the power of FINGER in tackling large network analysis and learning problems in different domains.

## References

• (1)

D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”

IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013.
• (2) V. Kalofolias, “How to learn a graph from smooth signals,” in

International Conference on Artificial Intelligence and Statistics (AISTATS)

, 2016, p. 920–929.
• (3) D. Luo, H. Huang, F. Nie, and C. H. Ding, “Forging the graphs: A low rank and positive semidefinite graph learning approach,” in Advances in Neural Information Processing Systems (NIPS), 2012, pp. 2960–2968.
• (4) R. Shivanna and C. Bhattacharyya, “Learning on graphs using orthonormal representation is statistically consistent,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3635–3643.
• (5) Y. Wang, Y.-X. Wang, and A. Singh, “Graph connectivity in noisy sparse subspace clustering,” in Artificial Intelligence and Statistics, 2016, pp. 538–546.
• (6) T. N. Kipf and M. Welling, “Variational graph auto-encoders,” in

Advances in Neural Information Processing Systems (NIPS) Bayesian Deep Learning Workshop

, 2016.
• (7) P. Papadimitriou, A. Dasdan, and H. Garcia-Molina, “Web graph similarity for anomaly detection,” Journal of Internet Services and Applications, vol. 1, no. 1, pp. 19–30, 2010.
• (8) L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 626–688, 2015.
• (9) S. Ranshous, S. Shen, D. Koutra, S. Harenberg, C. Faloutsos, and N. F. Samatova, “Anomaly detection in dynamic networks: a survey,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 7, no. 3, pp. 223–247, 2015.
• (10) P. Yanardag and S. Vishwanathan, “A structural smoothing framework for robust graph comparison,” in Advances in Neural Information Processing Systems (NIPS), 2015, pp. 2134–2142.
• (11) J. L. Sharpnack, A. Krishnamurthy, and A. Singh, “Near-optimal anomaly detection in graphs using lovasz extended scan statistic,” in Advances in Neural Information Processing Systems (NIPS), 2013, pp. 1959–1967.
• (12) D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos, “DeltaCon: Principled massive-graph similarity function with attribution,” ACM Transactions on Knowledge Discovery from Data, vol. 10, no. 3, p. 28, 2016.
• (13) G. Simonyi, “Graph entropy: A survey,” Combinatorial Optimization, vol. 20, pp. 399–441, 1995.
• (14) J. Shetty and J. Adibi, “Discovering important nodes through graph entropy the case of enron email database,” in Proceedings of the 3rd international workshop on Link discovery.   ACM, 2005, pp. 74–81.
• (15) A. Li and Y. Pan, “Structural information and dynamical complexity of networks,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3290–3339, 2016.
• (16) S. L. Braunstein, S. Ghosh, and S. Severini, “The Laplacian of a graph as a density matrix: a basic combinatorial approach to separability of mixed states,” Annals of Combinatorics, vol. 10, no. 3, pp. 291–317, 2006.
• (17) F. Passerini and S. Severini, “The von Neumann entropy of networks,” arXiv preprint arXiv:0812.2597, 2008.
• (18) ——, “Quantifying complexity in networks: the von Neumann entropy,” International Journal of Agent Technologies and Systems (IJATS), vol. 1, no. 4, pp. 58–67, 2009.
• (19) D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Transactions on Information theory, vol. 49, no. 7, pp. 1858–1860, 2003.
• (20) J. Briët and P. Harremoës, “Properties of classical and quantum Jensen-Shannon divergence,” Physical Review A, vol. 79, no. 5, p. 052311, 2009.
• (21) K. Anand and G. Bianconi, “Entropy measures for networks: Toward an information theory of complex topologies,” Physical Review E, vol. 80, no. 4, p. 045102, 2009.
• (22) K. Anand, G. Bianconi, and S. Severini, “Shannon and von Neumann entropy of random networks with heterogeneous expected degree,” Physical Review E, vol. 83, no. 3, p. 036109, 2011.
• (23) M. De Domenico, V. Nicosia, A. Arenas, and V. Latora, “Structural reducibility of multilayer networks,” Nature Communications, vol. 6, 2015.
• (24) L. Han, F. Escolano, E. R. Hancock, and R. C. Wilson, “Graph characterizations from von Neumann entropy,” Pattern Recognition Letters, vol. 33, no. 15, pp. 1958–1967, 2012.
• (25) L. Bai and E. R. Hancock, “Depth-based complexity traces of graphs,” Pattern Recognition, vol. 47, no. 3, pp. 1172–1186, 2014.
• (26) L. Seaman, H. Chen, M. Brown, D. Wangsa, G. Patterson, J. Camps, G. S. Omenn, T. Ried, and I. Rajapakse, “Nucleome analysis reveals structure-function relationships for colon cancer,” Molecular Cancer Research, pp. molcanres–0374, 2017.
• (27) Z. Li, P. J. Mucha, and D. Taylor, “Network-ensemble comparisons with stochastic rewiring and von Neumann entropy,” arXiv preprint arXiv:1704.01053, 2017.
• (28) J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.
• (29) F. Chung, “Laplacians and the Cheeger inequality for directed graphs,” Annals of Combinatorics, vol. 9, no. 1, pp. 1–19, 2005.
• (30) C. Ye, R. C. Wilson, C. H. Comin, L. d. F. Costa, and E. R. Hancock, “Approximate von Neumann entropy for directed graphs,” Physical Review E, vol. 89, no. 5, p. 052804, 2014.
• (31) J. Von Neumann, Mathematical foundations of quantum mechanics.   Princeton university press, 1955, no. 2.
• (32)

U. Luxburg, “A tutorial on spectral clustering,”

Statistics and Computing, vol. 17, no. 4, pp. 395–416, Dec. 2007.
• (33) Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the solution of algebraic eigenvalue problems: a practical guide.   SIAM, 2000.
• (34) R. A. Horn and C. R. Johnson, Matrix Analysis.   Cambridge University Press, 1990.
• (35) W. Du, X. Li, Y. Li, and S. Severini, “A note on the von Neumann entropy of random graphs,” Linear Algebra and its Applications, vol. 433, no. 11-12, pp. 1722–1725, 2010.
• (36) R. Merris, “Laplacian matrices of graphs: a survey,” Linear Algebra and its Applications, vol. 197-198, pp. 143–176, 1994.
• (37) L. Wu, E. Romero, and A. Stathopoulos, “Primme_SVDS: A high-performance preconditioned svd solver for accurate large-scale computations,” arXiv preprint arXiv:1607.01404, 2016.
• (38) W. N. Anderson Jr and T. D. Morley, “Eigenvalues of the Laplacian of a graph,” Linear and Multilinear Algebra, vol. 18, no. 2, pp. 141–145, 1985.
• (39) P. Erdös and A. Rényi, “On random graphs, I,” Publicationes Mathematicae (Debrecen), vol. 6, pp. 290–297, 1959.
• (40) A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999.
• (41) D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.
• (42) P. Van Mieghem, Graph Spectra for Complex Networks.   Cambridge University Press, 2010.
• (43)

K.-I. Goh, B. Kahng, and D. Kim, “Spectra and eigenvectors of scale-free networks,”

Physical Review E, vol. 64, no. 5, p. 051903, 2001.
• (44) A. E. Mislove, “Online social networks: measurement, analysis, and applications to distributed information systems,” Ph.D. dissertation, Rice University, 2009.
• (45) J. Preusse, J. Kunegis, M. Thimm, S. Staab, and T. Gottron, “Structural dynamics of knowledge networks,” in International AAAI Conference on Weblogs and Social Media, 2013.
• (46) A. Beloqui, M.-E. Guazzaroni, F. Pazos, J. M. Vieites, M. Godoy, O. V. Golyshina, T. N. Chernikova, A. Waliczek, R. Silva-Rocha, Y. Al-ramahi et al., “Reactome array: forging a link between metabolome and genome,” Science, vol. 326, no. 5950, pp. 252–257, 2009.
• (47) S. Liu, H. Chen, S. Ronquist, L. Seaman, N. Ceglia, W. Meixner, L. A. Muir, P.-Y. Chen, G. Higgins, P. Baldi, S. Smale, A. Hero, and I. Rajapakse, “Genome architecture leads a bifurcation in cell identity,” bioRxiv, 2017.
• (48) L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
• (49) T. Sørensen, “A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons,” Kongelige Danske Videnskabernes Selskab, vol. 5, pp. 1–34, 1948.
• (50) H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis, A graph-theoretic approach to enterprise network dynamics.   Springer Science & Business Media, 2007, vol. 24.
• (51) R. C. Wilson and P. Zhu, “A study of graph spectra for comparing graphs and trees,” Pattern Recognition, vol. 41, no. 9, pp. 2833–2841, 2008.
• (52) P.-Y. Chen and A. O. Hero, “Node removal vulnerability of the largest component of a network,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2013, pp. 587–590.
• (53) M. Fiedler, “Algebraic connectivity of graphs,” Czechoslovak Mathematical Journal, vol. 23, no. 98, pp. 298–305, 1973.
• (54) H. Weintraub, S. J. Tapscott, R. L. Davis, M. J. Thayer, M. A. Adam, A. B. Lassar, and A. D. Miller, “Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of myod,” Proceedings of the National Academy of Sciences, vol. 86, no. 14, pp. 5434–5438, 1989.
• (55) H. Weintraub, “The myod family and myogenesis: redundancy, networks, and thresholds,” Cell, vol. 75, no. 7, pp. 1241–1244, 1993.
• (56) D. Del Vecchio, H. Abdallah, Y. Qian, and J. J. Collins, “A blueprint for a synthetic genetic feedback controller to reprogram cell fate,” Cell Systems, 2017.
• (57) P.-Y. Chen, S. Choudhury, and A. O. Hero, “Multi-centrality graph spectral decompositions and their application to cyber intrusion detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4553–4557.
• (58) J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: densification laws, shrinking diameters and possible explanations,” in ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2005, pp. 177–187.

## Appendix A Proof of Lemma 1

For any real such that , it is easy to show that the Taylor series expansion of at is . Applying this result to the term in and taking the quadratic approximation of the series expansion gives

 Q=n∑i=1λi(1−λi)=1−n∑i=1λ2i (S1)

since by definition . The term in (S1) can be expressed as

 n∑i=1λ2i =trace(L2N) (S2) =n∑i=1n∑j=1[LN]ij[LN]ji (S3) (a)=n∑i=1n∑j=1[LN]2ij (b)=c2⎛⎝n∑i=1[L]2ii+n∑i=1n∑j=1,j≠i[L]2ij⎞⎠ (S4) (c)=c2⎛⎝∑i∈Vs2i+2∑(i,j)∈Ew2ij⎞⎠, (S5)

where is due to the matrix symmetry of , is due to the definition that , and is due to the definition of such that , and when and otherwise. Furthermore, define

 S=trace(L)=n∑i=1[L]ii=∑i∈Vsi=2∑(i,j)∈Ewij. (S6)

Using the relation , we obtain the expression , where and .

## Appendix B Proof of Theorem 1

The assumption implies for all nonzero eigenvalues . Following the definition of , we can rewrite as

 H =−n∑i=1λilnλi (S7) =−∑i:λi>0λilnλi (S8) =−∑i:λi>0λi(1−λi)lnλi1−λi. (S9)

Since for all , and , we obtain the relation

 −lnλmax1−λmin≤−lnλi1−λi≤−lnλmin1−λmax. (S10)

Using in (S1) and applying (S10) to (S9) yields

 −Qlnλmax1−λmin≤H≤−Qlnλmin1−λmax. (S11)

When is a complete graph with identical edge weight , it can be shown that the eigenvalues of have 1 eigenvalue at and identical eigenvalues at Merris94 . Since the trace normalization constant , the eigenvalues of are and for all , which implies . It is easy to see that in this case and . Consequently, the bounds in (S11) become exact and when is a complete graph with identical edge weight.

## Appendix C On the condition λmax<1 in Theorem 1

Here we show that the condition is always satisfied with any graph having a connected subgraph with at least 3 nodes. By definition, since it is the largest eigenvalue of the scaled matrix . Since any connected subgraph with at least 3 nodes will contribute to at least 2 positive eigenvalues of Mieghem10 ; CPY13GlobalSIP and all eigenvalues of sum to 1, we have .

## Appendix D Proof of Corollary 1

Since , the condition implies and are of the same order , where is the number of positive eigenvalues of . When the condition also holds, then and for some constants such that , and we obtain

 limn→∞−1lnn⋅lnλmax1−λmin=limn→∞1lnn⋅lnn−lna1−bn=1. (S12)

Similarly,

 limn→∞−1lnn⋅lnλmin1−λmax=1. (S13)

Taking the limit of and applying (S12) and (S13) to the bounds in (S11), we obtain

 limn→∞Hlnn−Q=0, (S14)

which completes the proof.

## Appendix E Proof of Corollary 2

Following the proof of Corollary 1, if and , then and for some constants such that . We have

 limn→∞H−ˆHlnn =limn→∞Hlnn−Q+Q−ˆHlnn (S15) (a)=limn→∞Q−ˆHlnn (S16) (b)=limn→∞Q−Q⋅lnn−lnalnn (S17) =0, (S18)

where uses (S14) and uses the definition of in (1) and . This implies the approximation error decays with . That is, .

## Appendix F Proof of Corollary 3

Let denote the largest eigenvalue of the graph Laplacian matrix of a graph . Then it is known that , where the lower bound is proved in Fiedler73 and the upper bound is proved in anderson1985eigenvalues . These bounds suggest that has the same order as , i.e., . Since by definition , it holds that and hence . Following the proof of Corollary 1, if and , then and for some constants such that , and for some since . Similar to the proof of Corollary 2, we have

 limn→∞H−˜Hlnn =limn→∞Hlnn−Q+Q−˜Hlnn (S19) (a)=limn→∞Q−˜Hlnn (S20) (b)=limn→∞Q−Q⋅lnn−lnγlnn (S21) =0, (S22)

where uses (S14) and uses the definition of in (2) and . This implies the approximation error decays with . That is, .

## Appendix G Proof of Theorem 2

Let and denote the graph Laplacian matrix of and , respectively, and let and be the corresponding trace-normalized matrices. Since and , it is easy to show that . We have

 c′−c=1S+ΔS−1S=−ΔS(S+ΔS)S=−cc′ΔS (S23)

since and . This then implies and

 Δc=c′−c=−c2ΔS1+cΔS. (S24)

Using the expression of quadratic approximation for VNGE in Lemma 1 and the relation that , we have

 Q−Q′ =(c+Δc)2⎛⎝∑i∈V(si+Δsi)2+2∑(i,j)∈E(wij+Δwij)2⎞⎠ −c2⎛⎝∑i∈Vs2i+2∑(i,j)∈Ew2ij⎞⎠<