# Fast Computing von Neumann Entropy for Large-scale Graphs via Quadratic Approximations

The von Neumann graph entropy (VNGE) can be used as a measure of graph complexity, which can be the measure of information divergence and distance between graphs. Since computing VNGE is required to find all eigenvalues, it is computationally demanding for a large-scale graph. We propose novel quadratic approximations for computing the von Neumann graph entropy. Modified Taylor and Radial projection approximations are proposed. Our methods reduce the cubic complexity of VNGE to linear complexity. Computational simulations on random graph models and various real network datasets demonstrate the superior performance.

## Authors

• 4 publications
• 2 publications
• 5 publications
• 34 publications
• ### Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications

The von Neumann graph entropy (VNGE) facilitates the measure of informat...
05/30/2018 ∙ by Pin-Yu Chen, et al. ∙ 4

• ### MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Efficient approximation lies at the heart of large-scale machine learnin...
06/03/2019 ∙ by Diego Granziol, et al. ∙ 0

• ### Mumford-Shah functionals on graphs and their asymptotics

We consider adaptations of the Mumford-Shah functional to graphs. These ...
06/22/2019 ∙ by Marco Caroccia, et al. ∙ 5

• ### From Sharma-Mittal to von-Neumann Entropy of a Graph

In this article, we introduce the Sharma-Mittal entropy of a graph, whic...
02/20/2019 ∙ by Souma Mazumdar, et al. ∙ 0

• ### Large-scale Multi-view Subspace Clustering in Linear Time

A plethora of multi-view subspace clustering (MVSC) methods have been pr...
11/21/2019 ∙ by Zhao Kang, et al. ∙ 16

• ### Distance entropy cartography characterises centrality in complex networks

We introduce distance entropy as a measure of homogeneity in the distrib...
02/28/2018 ∙ by Massimo Stella, et al. ∙ 0

• ### Quantized Minimum Error Entropy Criterion

Comparing with traditional learning criteria, such as mean square error ...
10/11/2017 ∙ by Badong Chen, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Immense data are involved in human life today. As these data grow increasingly intricate, a decent representation of its anomalous and complicated structure is needed. Graphs have the sophisticated capability of representing and summarizing irregular structural features. Hence, graph-based learning become an emerging and promising method with numerous applications in social network, biology, physics, and chemistry 8347162; 7974879; 6638850; 2018arXiv180107351D.

With the boosting quantity of graph data, it is crucial to find a way to manage them effectively. Therefore, graph similarity as a typical way of presenting the relationship between graphs have been vastly applied 8445612; NIPS2005_2932. For example, Sadreazami et al. proposed an intrusion detection methodology based on learning graph similarity with a graph Laplacian matrix 8027126; Lin and Kung presented MatchEdge function to score similarity between two directed acyclic graphs 650096; also, Yanardag and Vishwanathan gave a general framework to smooth graph kernels based on graph similarity Yanardag. However, all the above approaches relied on presumed models, and thus limited their ability of being applied on comprehending the general concept of divergences and distances between graphs.

Meanwhile, graph entropy PhysRevE.80.045102 which is a model-free approach has been actively used as a way to quantify the structural complexity of a graph 7456290

. By regarding the eigenvalues of the normalized combinatorial Laplacian of a graph as a probability distribution, we can generate its Shannon entropy.

By giving the density matrix of a graph, which is the representation of a graph in a quantum mechanical state, we can calculate its von Neumann entropy PhysRevE.83.036109. Bai et al. proposed algorithm to solve depth-based complexity characterization of graph using von Neumann entropy 6460767; Liu et al. gave method of detecting bifurcation network event based on von Neumann entropy 8461400. Other applications including graph clustering 6460766, network analysis 8434737, and structural reduction of multiplex networks articleehnail. But the time complexity of calculating exact value is high.

The von Neumann entropy, which was introduced by John von Neumann, is the extension of classical entropy concepts to the field of quantum mechanics hall2013quantum. He introduced the notion of the density matrix, which facilitated the extension of the tools of classical statistical mechanics to the quantum domain in order to develop a theory of quantum measurements.

Denote the trace of a square matrix as . A density matrix is a Hermitian positive semidefinite in with unite trace. The density matrix is a matrix that describes the statistical state of a system in quantum mechanics. The density matrix is especially useful for dealing with mixed states, which consist of a statistical ensemble of several different quantum systems.

The von Neumann entropy of a density matrix , denoted by , is defined as

 H(ρ)=−tr(ρlnρ)=−n∑i=1λilnλi,

where are eigenvalues of . It is conventional to define . This definition is a proper extension of both the Gibbs entropy and the Shannon entropy to the quantum case.

### i.1 Von Neumann Graph Entropy

In this article we consider only undirected simple graphs with non-negative edge weights. Let denote a graph with the set of vertices and the set of edges , and the weight matrix . The combinatorial graph Laplacian matrix of is defined as , where is a diagonal matrix and its diagonal entry DBLP:journals/sac/Luxburg07. The density matrix of a graph is defined as

 ρG=1tr(L)L,

where is a trace normalization factor. Note that is a positive semidefinite matrix with unite trace. The von Neumann entropy for a graph , denoted by , is defined as

 H(G):=H(ρG),

where is the density matrix of Braunstein2006; Quantifying2009.

It can be proven that for any graph with , it holds that , and the equality holds when is a complete graph Passerini2012The. Obviously, the definition of von Neumann graph entropy is based on the combinatorial graph Laplacian matrix. There are variants of von Neumann graph entropy proposed in the prior literature, including the normalized graph Laplacian matrix Shi2000; Han2012Graph and the generalized graph Laplacian matrix of directed graphs Fan2005Laplacians; Ye2014Approximate.

However, these alternatives fail to consider approximation justification and are also shown to be suboptimal. Although various definitions have been proposed, computing the exact value based on different definition is still expensive, especially for a large-scale graph. Computing von Neumann graph entropy requires the entire eigenspectrum of . This calculation can be done with time complexity DBLP:books/daglib/0001349; DBLP:books/daglib/0086372, making it computationally impractical for large-scale graphs. Thus it is required to find an efficient method to compute von Neumann entropy for large-scale graphs. Although the von Neumann graph entropy have been proved to be an feasible approach in the computation of Jensen-Shannon distance between any two graphs from a graph sequence articleehnail

. However, in the process of machine learning and data mining tasks, a sequence of large graphs will be involved. Therefore, it is of great significance to improve numerical algorithms that approximate the von Neumann entropy of large-scale graphs faster than the previous

approach. More details about the application of Jensen-Shannon distance will be shown in Section IV. To tackle this challenge about computational inefficiency, Chen et al. 2018arXiv180511769C proposed a fast algorithm for computing von Neumann graph entropy, which uses the quadratic polynomial to approximate the term rather than extracting the eigenspectrum of

. It was shown that the proposed approximation is more efficient than the exact algorithm based on the singular value decomposition. Our work was inspired by their approach. However, our proposed method have superior preformance in random graphs as well as real datasets with time complexity

.

For a Hermitian matrix it is true that where are all eigenvalues of . Since , one natural approach to approximate the von Neumann entropy of a density matrix is to use a Taylor series expansion to approximate the logarithm of a matrix. It is required to calculate , for some . Indeed, for a symmetric matrix whose eigenvalues are all in the interval . Assuming that all eigenvalues of a density matrix are nonzeros, we have

 H(ρ)=−lnλmax+∞∑j=11jtr(ρ(In−(λmax)−1ρ)j),

where is the maximum eigenvalue of . We refer to 2018arXiv180101072K for more details. However, the computational complexity is , so it is impractical as grows.

In this article, we propose quadratic approximations to approximate the von Neumann entropy for large-scale graphs. It is noted that only is needed to compute them. We consider various quadratic polynomials to approximate on (). Then Such approximations are required to be considered on the only values in such that their sum is 1, which are eigenvalues of a given density matrix.

Recall that is called purity of in quantum information theory. The purity gives information on how much a state is mixed. We denote the purity of as or simply . For a given graph , the purity of can be computed efficiently due to the sparsity of the as follows.

###### Lemma 1.

For a graph ,

 tr(ρ2G)=1(tr(L))2(∑i∈VS2ii+2∑(i,j)∈ELij).
###### Proof.

Since are symmetric, it follows that

 tr(ρ2G) =||ρG||2F=1(trL)2||L||2F =1(trL)2∑1≤i,j≤nL2ij =1(trL)2(∑1≤i≤nL2ii+2∑1≤i

It is trivial that only depends on the edge weights in , resulting in linear computation complexity , where and .

We denote the maximum eigenvalue of a given density matrix as . When the eigenvalues of are given, we denote and as and , respectively. We will use them interchangeably.

###### Theorem 1.

The following are true.

• .

• .

• .

###### Proof.

Let be eigenvalues of a given density matrix .

• Although this is one of known properties, we provide a proof. Let . It is true that for all , since

 f(p1)+f(p2)−f(p1+p2) =−p1lnp1−p2lnp2+(p1+p2)ln(p1+p2) =p1(ln(p1+p2)−lnp1) +p2(ln(p1+p2)−lnp2)≥0.

Thus, by induction we have

 f(λ2+⋯+λn)≤f(λ2)+⋯+f(λn).

Therefore, .

The first inequality holds from the fact that . Note that for all . Thus,

 −λmaxlnλmax−(1−λmax)ln(1−λmax) ≥−λmax(λmax−1)−(1−λmax)(−λmax) =2λmax(1−λmax).
• Since , clearly .

• It is known that the purity satisfies . Since is concave on , by Jensen’s inequality, it follows that

 f(γ)=f(n∑i=1λ2i)≥n∑i=1λif(λi)=n∑i=1−λ2ilnλi,

where are the spectrum of . Note that the last term can be expressed as .

In this article we consider the following quadratic approximations for von Neumann entropy: (A) FINGER-; (B) Taylor-; (C) Modified Taylor-; (D) Radial Projection-. Note that they all can be computed by the purity of density matrix of a graph. Additionally (A) and (C) need the maximum eigenvalue as well.

### ii.1 Finger-ˆH

Chen et al.2018arXiv180511769C proposed a fast incremental von Neumann graph entropy(FINGER) to reduce the cubic complexity of von Neumann graph entropy to linear complexity. They considered the following quadratic polynomial to approximate (see Fig. 1).

 q(x)=−(lnλmax)x(1−x)on [0,1]. (1)

Then FINGER, denoted by , is defined as

 ˆH(G) :=n∑i=1q(λi)=−(lnλmax)(1−tr(ρ2G)),

where are the eigenvalues of .

Note that since all eigenvalues are smaller than or equal to the maximum eigenvalue, we mainly consider the interval instead of . Now let us show that the approximation FINGER is always smaller than the exact von Neumann entropy.

###### Lemma 2.

The following are true.

• is concave on .

• on .

###### Proof.

It is trivial for . Suppose that .

• Note that on all . So, . Then , implying, for all . Thus,

 (f−q)′′(x)=−1−2xlnλmaxx<0.
• When or , it is trivial. Since , it suffices to show for all . By observing the tangent line at for , it is easy to check that . Then

 λmax−1lnλmax>λmax.

Thus, for all . Since and for all , it follows that for all .

###### Theorem 2.

For any density matrix , it holds that

• .

• The equality holds if and only if .

###### Proof.
• Let . Then it is easy to check that for all . Thus, . By induction, it holds that

 q(t1+⋯+tn)≤q(t1)+⋯q(tn),

for all , . Then it follows that

 ˆH(λ1,…,λn) =n∑i=1q(λi)=q(λ1)+n∑i=2q(λi) ≤q(λ1)+q(n∑i=2λi) =q(λmax)+q(1−λmax),

where are spectrum of .

• Let be an eigenvalue of . Since , it follows that

 f(λ)−q(λ) =−λlnλ+(lnλmax)λ(1−λ) ≥−λlnλ+(lnλ)λ(1−λ) =−λ2lnλ≥0.

Note that and if and only if for all .

• By Lemma 2 (2) we can find the following error bound for FINGER.

###### Remark.

By Theorem 2 (1) and Theorem 1 (1) it holds that and . However, it is questionable if it holds that . If this is true, then it is possible to show more interesting result as follows:

 H(ρ)−ˆH(ρ)=n∑i=1f(λi)−q(λi) ≥f(λmax)−q(λmax)+f(1−λmax)−q(1−λmax) =−λmaxlnλmax+λmax(1−λmax)lnλmax −(1−λmax)ln(1−λmax)+λmax(1−λmax)ln(1−λmax) =−(1−λmax)2ln(1−λmax)−(λmax)2lnλmax ≥ln22.

That is, always has error greater then . For the dataset in Fig. 5 it shows that the smallest error for the real dataset is .

The following results were established in 2018arXiv180511769C.

###### Theorem 3.

For any , let and be the largest and smallest positive eigenvalue of , respectively. If , then

 −(1−tr(ρ2G))lnλmax1−λmin≤H(G)≤−(1−tr(ρ2G))lnλmin1−λmax.

The bounds become exact and when is a complete graph with identical edge weight.

###### Corollary 1.

For any , let denote the number of positive eigenvalues of . If and , then

 Hlnn−(1−tr(ρ2G))→0 % as n→∞.

### ii.2 Taylor-T

Since the sum of all eigenvalues of density matrix is , the average of them is . As gets bigger, the average gets closer to . Thus, for a large-scale density matrix, many of eigenvalues must be on . So, it is reasonable to use Taylor series for at instead of . In fact, since does not exist, there does not exist Taylor series for at .

###### Lemma 3.

Let be the maximum eigenvalue of a density matrix . Then, . Especially, if and only if . Also it is ture that if and only if .

###### Proof.

Since , if then all eigenvalues are 0 except . It is known that and the equality holds when for all .∎

We can propose the quadratic Taylor approximation for at as follows.

 q(x) =−n2x2+(lnn)x+12n.

Using such approximation, Taylor, denoted by , is defined as

 T(G) =−n2tr(ρ2G)+lnn−12.

As Fig. 2 shows, The function is very similar to the function near . However, as the maximum eigenvalue gets closer to 1, the error becomes very large. Note that as for . Alternatively, this approximation needs to be modified. We use the information about in order to reduce the error.

### ii.3 Modified Taylor-ˆT

Consider the quadratic approximation, , to approximate such that it holds

assuming that , we have

 q(x)=σx2+(lnn−1−2σn)x+σ+nn2, (2)

where

 σ=−nλmaxln(nλmax)+nλmax−1n(λmax−1n)2.

Using such approximation, the Modified Taylor, denoted by , is defined as

 ˆT(G) =σ(tr(ρ2G)−1n)+lnn.
###### Lemma 4.

The following are true.

• is concave on .

• .

###### Proof.
• It suffices to show that . Let . Since and for all , it is true that for all . By Lemma 3, , so .

• Let . By Lemma 3, . Since

 1−2σ−1n=1n((t−1)22(tlnt−t+1)−1),

It is enough to show that . Recall that for all . Let . Since for all , for all . From the fact , clearly, for all . So, . Taking , we have

 λmax−1−2σ =tn−(t−1)22(tlnt−t+1) =2t(tlnt−t+1)−n(t−1)22n(tlnt−t+1).

###### Theorem 4.

For any density matrix , it holds that

 ˆT(ρ)≥H(ρ).
###### Proof.

Let . Clearly, and . We show that for three different intervals: (i) , (ii) , and (iii) . (i) Since , Lemma 4 (2) implies that is convex on . Since , on . (ii) In the similar way, it holds that on . (iii) Since is concave on , by the definition of concavity, it follows that

 h(−12σt+λmax(1−t))≥h(−12σ)t+h(λmax)(1−t)≥0

for all . The last inequality holds from the fact that and . Thus, on . Therefore, by (i), (ii), (iii) it holds that on . ∎

We denote the simplex of positive probability as , i.e., Also we denote . Clearly, . The Shannon entropy of is defined as It is well-known that it holds , with equality if . That is, is the only point where the entropy is maximum.

As Fig. 3 is shown, the simplex of positive probability can be geometrically a part of plane in . The color stands for the value of Shannon entropy at each point. One can see that as gets closer to , the entropy gets bigger and bigger. Our main observation is that if two points on have same (Euclidean) distances from , then their purity are same. In general it holds for any .

###### Lemma 5.

Let . Then the following are true.

• ( is the usual inner product.)

###### Proof.

It is easy to check that

 ||E−Λ||22=n∑i=1(λi−1n)2=n∑i=1λ2i−1n.

###### Theorem 5.

Let and be density matrices. Then if and only if their purity are identical.

###### Proof.

Note that if and only if , where and are eigenvalues of and , respectively. It is true from Lemma 5. ∎

Lemma 5 states that two points on have the same distance from if and only if the distances from the origin are identical. Then we can find whose entropy can be computed much easily such that

 ||E−Λ||2=||E−˜Λ||2.

There are infinitely many directions from to find . Among them we pick with .

We consider the line segment , , i.e.,

 ℓ(t)=(1−tn+tc,1−tn+t(1−c)n−1,…,1−tn+t(1−c)n−1).

Since is convex, for all . For each , we have

 S(ℓ(t)) =−(1−tn+tc)ln(1−tn+tc)

Lemma 5 implies that

 ||E−ℓ(t)||22=t2||w−E||22=(cn−1)2n(n−1)t2.

We solve the following equation for :

 ||E−Λ||2=||E−ℓ(t)||2.

Then the solution, say , is

 t0=√n(n−1)cn−1 ⎷n∑i=1λ2i−1n.

Since , we have that

 S(Λ)≈S(ℓ(t0))=−(1−t0n+t0c)ln(1−t0n+t0c) −(n−1)(1−t0n+t0(1−c)n−1)ln(1−t0n+t0(1−c)n−1).

In fact, putting into the right side the constant can be cancelled. Thus, this approximation does not need the maximum eigenvalue.

In the similar way we propose the quadratic approximation for the von Neumann graph entropy, called Radial Projection. Radial projection, denoted by , is defined as

 R(G)=−(√n−1nκG+1n)× ln(−√n−1nκG+1n)(−1√n(n−1)κG+1n),

where .

### ii.5 Weighted mean

Denote the weighted mean of as . That is, In the similar way we consider and . By Theorem 2 (2), it is shown that FINGER- is always smaller than the exact von Neumann entropy. On the other hand, by Theorem 4 Modified Taylor- is always greater than the exact von Neumann entropy. Even though it is not proved mathematically, Fig. 3, 5, 6 show that Radial projection- are greater than the exact von Neumann entropy. The weighted mean of them can be computed to improve the approximations.

We solve a optimization problem to find optimal . For example, consider . Given large quantity of real data sets, the approximation of von Neumann entropy using and were calculated as the input values and while the actual von Neumann entropy value were also calculated as output values for ( is the number of data sets). Then the optimization problem is given as follows:

 t∗=argmin0≤t≤1J(t),

where the cost function is

 J(t)=1NN∑i=1(tx(i)ˆH+(1−t)x(i)ˆT−y(i))2.

We use the gradient descent method to find optimal . Initially, is given. In each step of gradient descent, is updated with the function:

 tj=tj−1−αJ′(tj−1),

where denotes iteration times, and is the step size, which is set to be .

Using the optimal values solved by the gradient descent method, we call and as Improved Modified Taylor and Improved Radial Projection, respectively.

## Iii Experiments

In this section results from various experiments with data sets are provided. All experiments were conducted by MATLAB R2016 on a 16-core machine with 128GB RAM.

### iii.1 Random graphs

Three random graph models are considered: (i) the Erdős-Rényi (ER) model ER59; gilbert1959 - the ER model represents two closely related models that were introduced independently and simultaneously. Here we use the ER model which was proposed by Gilbert gilbert1959. is denoted as a model with nodes, and each pair of nodes were linked independently with probability ; (ii) the Barabási-Albert (BA) model RevModPhys.74.47 - the BA model is a special case of the Price’s model. It can generate scale-free graphs in which the degree distribution of the graph follows the power law distribution; and (iii) the Watts-Strogatz (WS) model watts1998collective - the WS model generates graphs with small world properties, given a network with N nodes and the mean degree K, initially nodes are linked as a regular ring where each node is connected to nodes in each side, then rewire the edges with probability . The approximation error is defined as The results are averaged over 10 random trials.

### iii.2 Real-world datasets

The real-world datasets in various fields are considered nr; OPSAHL2009155; OPSAHL2013159. We use 137 different number of unweighted networks and 48 different number of weighted networks in different fields for simulations. The detailed information about datasets on Fig. 5 and Fig. 6 can be found at https://github.com/Hang14/RDJ. Fig. 5 and Fig. 6 show the scatter points of the von Neumann entropy (y-axis) versus the quadratic approximations (x-axis) for both the unweighted and weighted real-world datasets. It demonstrates that Modified Taylor and Radial Projection have better performances than FINGER. Improved Modified Taylor and Improved Radial Projection show the strongest performance.

### iii.3 Time comparison

Recall that computing von Neumann graph entropy requires computational complexity. In order to accelerate its computation, we use the quadratic approximations for the function . Then each approximation can be computed by the purity of the density matrix for a given graph. Lemma 1 shows that computing the purity requires computational complexity, where and . However, FINGER and Modified Taylor additionally need to compute the maximum eigenvalue whose time complexity is DBLP:books/daglib/0001349; DBLP:books/daglib/0086372. On the other hand, Radial projection shows the best performance with no maximum eigenvalue. When the original graph is a complete graph then which is also the upper bound of . However, in real life scenario, complete graph is rare, instead, sparse graph are more commonly seen, therefore, the time complexity can remain in a rather linear form.

## Iv Applications

One major application of von Neumann graph entropy(VNGE) is the computation of Jensen-Shannon distance(JSdist) between any two graphs from a graph sequence articleehnail. Given a graph sequence , the Jensen-Shannon distance of any two graphs and is defined as

 JSdist(G,G′)=√H(¯¯¯¯G)−12[H(G)+H(G′)],

where is the averaged graph of and such that . The Jensen-Shannon distance has been proved to be a valid distance metric in DBLP:journals/tit/EndresS03; PhysRevA.79.052311.

The Jensen-Shannon distance have been applied into many fields including network analysis articleehnail and machine learning 7424294

. Especially, it can be used in anomaly detection and bifurcation detection

8461400. 2018arXiv180511769C demonstrated the validation of using FINGER- for computing VNGE. Comparing to the state-of-art graph similarity methods, FINGER- yields superior and robust performance for anomaly detection in evolving Wikipedia networks and router communication networks, as well as bifurcation analysis in dynamic genomic networks. Note that the simulations show that our proposed methods show better performance than FINGER-.

## V Conclusion

We proposed quadratic approximations for efficiently estimating the von Neumann entropy of large-scale graphs. It reduces the computation of VNGE from cubic complexity to linear complexity for a given graph. FINGER, Taylor, Modified Taylor, and Radial Projection and considered. We found several theoretical results for each approximation including inequalities between approximations and the exact von Neumann entropy. Moreover, the novel quadratic approximation - Radial Projection was proposed. Even the Radial Projection does not need to compute the maximum eigenvalue. Moreover, using the weighted mean with optimal value

, we improved the approximiations as well. Computational simulations demonstrated that the proposed method outperforms the state-of-art method for random graphs as well as real datasets.