DeepAI
Log In Sign Up

An edge centrality measure based on the Kemeny constant

03/12/2022
by   D. Altafini, et al.
0

A new measure c(e) of the centrality of an edge e in an undirected graph G is introduced. It is based on the variation of the Kemeny constant of the graph after removing the edge e. The new measure is designed in such a way that the Braess paradox is avoided. A numerical method for computing c(e) is introduced and a regularization technique is designed in order to deal with cut-edges and disconnected graphs. Numerical experiments performed on synthetic tests and on real road networks show that this measure is particularly effective in revealing bottleneck roads whose removal would greatly reduce the connectivity of the network.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

10/22/2021

Monotone edge flips to an orientation of maximum edge-connectivity à la Nash-Williams

We initiate the study of k-edge-connected orientations of undirected gra...
01/18/2019

Extremality and Sharp Bounds for the k-edge-connectivity of Graphs

Boesch and Chen (SIAM J. Appl. Math., 1978) introduced the cut-version o...
11/17/2022

Extensions of the (p,q)-Flexible-Graph-Connectivity model

We present approximation algorithms for network design problems in some ...
02/11/2020

A simple certifying algorithm for 3-edge-connectivity

A linear-time certifying algorithm for 3-edge-connectivity is presented....
04/07/2022

Just-Noticeable-Difference Based Edge Map Quality Measure

The performance of an edge detector can be improved when assisted with a...
03/26/2016

Reconstructing undirected graphs from eigenspaces

In this paper, we aim at recovering an undirected weighted graph of N ve...
05/20/2020

Edge removal in undirected networks

The edge-removal problem asks whether the removal of a λ-capacity edge f...

1 Introduction

In network analysis, several measures of the importance of an edge of a graph, having different modellistic meanings and mathematical formulations, have been introduced. For instance, in [2, 13] the communicability between two nodes , of a graph is defined as the -th entry in the exponential of the adjacency matrix of . The exponential of a matrix is also at the basis of the definition of importance given in [12]. Other measures based on the computation of matrix functions are introduced in [4], where a parameterized node centrality measure is introduced, and in [3] where directed networks are analyzed. In [11] the idea of considering the variation of the Kemeny constant, when an edge is removed from a graph, is considered.

In this, paper, following [11], we introduce and analyze a new definition of centrality based on a modified variation of the Kemeny constant.

Let

be the transition matrix of a finite irreducible Markov chain, let

be its invariant measure, so that and , where . The Kemeny constant is defined as the average first-passage time from a predetermined state to a state

drawn randomly according to the probability distribution

. It is a surprising but well-studied fact that this definition does not depend on  [18].

Given a connected undirected graph , where is the set of vertices and the set of edges (possibly with weights), denote by the associated adjacency matrix. The Kemeny constant of the graph is defined as , where

is the stochastic matrix

, with , , and

denotes the diagonal matrix having the entries of the vector

on the diagonal. The Kemeny constant gives a global measure of the non-connectivity of a network [5, 11, 20]. Indeed, if is not connected then the Kemeny constant cannot be defined or, in different words, it takes the value infinity.

Following the idea of [11], we may formally define the Kemeny-based centrality score of the edge as

i.e., the change of the connectivity of the graph measured by the Kemeny constant, when the edge is removed from the graph itself. This quantity is well defined assuming that also is connected. Recall that an edge such that is disconnected is known as a cut-edge in graph theory.

We show that, in matrix form, the value of

can be given in terms of the eigenvalues of the symmetric matrix

and of the eigenvalues of , where is a symmetric correction of rank 2. A drawback of this definition of centrality score is that there exist graphs where is negative for some , an elementary example is shown in Section 3.2. In the literature, this fact is known as the Braess paradox [11], [7]. Its matrix explanation is that the correction is not positive semi-definite.

To overcome this drawback we propose a modified centrality measure, which is nonnegative for any graph and for any edge . The underlying idea consists in modifying the correction in such a way that the new correction is a positive semi-definite matrix of rank 1. From the model point of view, this consists in replacing the edge with the loops and . More precisely, the centrality score of is modified as follows

Since the eigenvalues of the matrix are greater than or equal to the corresponding eigenvalues of , then for any and for any graph. This guarantees that the Braess paradox is not encountered.

This definition cannot be applied in the case where is a cut-edge, i.e., is not connected; in fact, in this case, the definition would yield . To overcome this drawback, we introduce the concept of regularized centrality score , depending on a regularization parameter . The idea is to replace the Laplacian matrix with the regularized Laplacian matrix in the formulas that give the Kemeny constant. If is not a cut-edge, then ; otherwise, if is a cut-edge then ; indeed, expressing the Kemeny constant in terms of the eigenvalues of , one sees that it contains a term . For cut-edges, the quantity is nonnegative and has a finite limit for ; this suggests the following definition of a filtered Kemeny-based centrality score

The modified measure defined in this way is always non-negative, and seems particularly effective in highlighting bottlenecks in road networks, or so-called weak ties [16] that bridge different clusters.

We provide efficient algorithms implementing the computation of the score either of a single edge, or of all the edges of a graph. The main tools in the algorithm design are the Sherman-Woodbury-Morrison formula and the Cholesky factorization of the regularized Laplacian matrix .

Our algorithms have been tested both on synthetic graphs and on graphs representing real road networks, in particular, we have considered the maps of Pisa and of the entire Tuscany. From our numerical experiments, reported in the paper, it turned out that this measure is robust, effective, and realistic from the model point of view, moreover, its computation is sufficiently fast even for large road networks. Comparisons with other centrality measures from [13] have been performed. It turns out that our model, unlike the ones based on PageRank and Betweenness of the dual graph, succeeds in detecting bridges on the river Arno and overpasses over the railroad line as important roads in the Pisa road map. The edge betweenness and edge current-flow betweenness are the only two measures (among those considered) that succeed, even though only partially, in highlighting important bottleneck roads. The CPU time required for the computation of this measure is comparable with that of other betweenness-based measures on planar networks of roads. More details concerning applications of the Kemeny-based centrality measure to road networks can be found in [1].

The paper is organized as follows. In Section 2 we recall some properties of the Kemeny constant. In Section 3 the Kemeny-based centrality measure is introduced and a matrix analysis is performed, while in Section 4 a modified definition is proposed in order to avoid the Braess paradox. The regularized and filtered centrality scores are proposed in Section 5. Section 6 is devoted to computational issues and numerical experiments. Conclusions are drawn in Section 7.

2 The Kemeny constant

Let be the transition matrix of an irreducible finite Markov chain and let be its steady state vector. Denote by the Kemeny constant of . We recall some properties which allow to express the Kemeny constant in terms of the trace of a suitable matrix. Such expressions will be useful in the analysis performed in the next sections.

Lemma 1 ([23])

Let be column vectors with , , . Then, the inverse exists, and

independently of .

By setting , one gets the following corollary.

Corollary 1

Let be a column vector with ; then, exists, and

(1)

Since is an irreducible stochastic matrix, then it has a simple eigenvalue equal to 1. The Kemeny constant can be expressed by means of the eigenvalues different from 1, according to the following result.

Corollary 2

Let be the spectrum of . Then,

(2)

Proof

Take a Jordan form with , , and (reordering if necessary). Then, one has

where is upper triangular with . Plugging this expression into (1), we get

3 A centrality measure based on the Kemeny constant

Given a connected undirected graph (possibly weighted) , where denotes the set of vertices and the set of edges (possibly with weights), one can define its Kemeny constant as , with , where is the adjacency matrix of the network, and , . The Kemeny constant gives a global measure of the connectivity of a network; in fact, small values of the constant correspond to highly connected networks, and large values correspond to a low connectivity.

To obtain a relative measure that takes into account the importance of each edge , we can define the Kemeny-based centrality score as

(3)

i.e., the change in obtained by removing the edge . This quantity is well defined assuming that is still connected, that is, is not a cut-edge.

Removing one edge corresponds to zeroing out the entries and . This leads to the new adjacency matrix

(4)

where and are the -th and the

-th columns of the identity matrix

, respectively. This removal changes the transition matrix into the matrix , where , , that differs from only in rows and since . Hence we have

(5)

where

(6)

with .

Theorem 1

Suppose edge is not a cut-edge. Then, for the centrality score defined in (3) we have

(7)

where , is defined in (6), and is as in Corollary 1.

Proof

We have

where we have used (5) and in the last step the Sherman-Woodbury-Morrison matrix identity [15, Section 2.1.4]. We now use (1) and write

using the identity  [19, Chapter 1, Exercise 5].

Theorem 1 allows us to compute the centrality score of one edge at essentially the cost of applying the matrix to four vectors.

3.1 A symmetrized formulation

Observe that is such that is a symmetric matrix having the same spectrum of . Therefore, in view of Corollary 1 we may write

Moreover, choosing and such that , yields

(8)

In the above expression, the matrix is real symmetric.

The symmetrization of the matrix can be easily obtained in a similar manner, that is,

(9)

Thus, we may write , where is a low rank symmetric matrix. This fact enables us to exploit the properties of the eigenvalues of symmetric matrices like the Courant-Fischer theorem [6, Chapter III].

3.2 Disconnected networks and cut-edges

If is reducible, according to our earlier definitions, say, definition (2), we would get , since in this case has at least two eigenvalues equal to . Therefore, one cannot apply the definition of the Kemeny-based centrality score. However, we may extend this definition to reducible matrices by means of a continuity argument as follows.

Assume reducible and w.l.o.g. assume , where , , are irreducible stochastic matrices. Clearly, the matrix has eigenvalues , and for . Observe that the perturbed matrix is stochastic and irreducible for any , so that has only one eigenvalue equal to 1. Moreover, in view of the Brauer theorem [9], the remaining eigenvalues of are given by , . Therefore we have

Now consider the matrix where is obtained by removing the edge . Assume that this edge belongs to the block for some and that it is not a cut-edge. That is, the block obtained after removing the edge is still irreducible. Denote , the eigenvalues of so that , and for . By applying once again the Brauer theorem we find that has only one eigenvalue , and the remaining eigenvalues are , for . Therefore we have

so that

whence

Now recall that the removed entries and in belong to the block so that the eigenvalues of different from the eigenvalues of are those of the block , except for the eigenvalue 1. Therefore we have

From the above arguments it is natural to extend the definition of centrality score of an edge to the case of reducible matrices as follows.

Definition 1

Let , , be such that are irreducible stochastic matrices. Let belong to the set of indices of the block , and assume that the edge is not a cut-edge. The Kemeny-based centrality of the edge is defined as

where is the stochastic matrix obtained from by removing the edge , according to equation (5).

If is reducible, i.e., if the graph is disconnected, then we can identify its connected components, locate the block containinmg the edge and apply the above definition in order to evaluate .

If is a cut-edge, then clearly is reducible so that , consequently . Several graph-theoretical algorithms exist in literature to compute cut-edges in a graph in time  [22], and update the connected components of a graph after removing edges [21]. However, we prefer to deal with this issue by means of the regularization technique that we will describe later on.

4 A non-negative Kemeny-based centrality score

Intuitively, one expects that the connectivity of a graph should not increase if an edge is removed from the graph. Therefore, if the Kemeny constant properly describes the non-connectivity of a graph, then it should not decrease if an edge is removed. In terms of definition of centrality score given in (3), we expect that . Unfortunately, it is not so.

In fact, there are cases where the Kemeny constant of a graph can decrease if an edge is removed, like in the graph with edges , shown in Figure 1, on the left. Its Kemeny constant is . Removing the edge , we get the graph on the right, which has a smaller Kemeny constant, i.e., . That is, the centrality score of the edge in this graph is negative. This fact is known in the literature as the Braess paradox [11], [7].

Figure 1: The graph on the right is obtained from that on the left by removing the edge . The two graphs have Kemeny constants and , respectively.

In order to overcome this odd behavior of the model, where the measure

can take negative values, we propose a simple modification which also makes the computation of an easier task.

Observe that removing the edge from the graph consists in performing a correction to the adjacency matrix of rank 2 in order to obtain the new matrix , compare with (4). This correction is such that the vector differs from the vector in the components and . On the other hand, defining in a different way, by means of the following expression

(10)

has the effect of zeroing the entries and in , and of adding to the diagonal entries in position and . In terms of graph, this correction consists in removing the edge and adding the two loops and with the same weight .

The advantage of this correction is that the vectors and satisfy the identity since . This property allows us to prove that the centrality score, defined this way, always takes nonnegative values. In order to prove this property we need to recall the following classical result that is a consequence of the Courant-Fischer minimax theorem [6, Chapter III].

Lemma 2

Let be real symmetric matrices such that , and let , be their eigenvalues, respectively, ordered in nondecreasing order. Then , for .

We are ready to prove the following result.

Theorem 2

Let be the adjacency matrix of an undirected graph, let be such that the edge is not a cut-edge, and let be the adjacency matrix defined in (10). Then for the centrality score defined as we have , where , , , .

Proof

Write in terms of the symmetrized formulation according to (8) and (9), and get

(11)

where and , are the eigenvalues, sorted in non-increasing order, of the symmetric matrices

respectively. On the other hand, since and , we have . The matrix has eigenvalues equal to 0 and one eigenvalue equal to . Applying Lemma 2 with and yields the inequality . This implies that in view of (11).

Observe that can be interpreted as the incremental ratio for the increment at of the function , for , where , , . That is, for . An interesting question is to evaluate the derivative of at . We have the following result

Theorem 3

Under the assumptions of Theorem 2, let , for , where , , , . Let , be the eigenvalues of , where . Then .

Proof

Denote the eigenvalues of , where . We have

Since (compare with the proof of Theorem 2), taking the limit for yields

A similar inequality can be proved for . Observe that the upper bound to given in the above theorem coincides with the value up to within a constant factor independent of and . This value depends on the out degree of node and of node independently of the topology of the graph.

Computing the value of defined in (10) is cheaper than computing the quantity defined in (4). To this regard, we have the following

Theorem 4

Under the assumptions of Theorem 2 we have

where , . Moreover, , for .

Proof

By using symmetrization, we have . On the other hand, , so that . The expression for follows from the Sherman-Woodbury-Morrison identity.

The above result can be used to obtain an effective expression for computing . To this end, rewrite as

so that

Moreover, from the Sherman-Woodbury-Morrison formula we have

Whence in view of Theorem 4 we obtain

From the above result we obtain the following representation of

(12)

Observe that and in (12) can be rewritten as

(13)

Another observation is that the matrix is positive definite since it is invertible and is the sum of two semidefinite matrices. Therefore, it admits the Cholesky factorization .

The major computational effort in computing by means of (12) consists in solving the system . If one has to compute the centrality score of a single edge , then two strategies can be designed for this task. A first possibility consists in computing the Cholesky factorization of and solving the two triangular systems. This approach costs arithmetic operations, as the dominating cost is the one of the Cholesky factorization. A second possibility consists in applying an iterative method for solving the linear system with matrix , that exploits the low cost of the matrix-vector product, say, Richardson iteration or preconditioned conjugate gradient method. This approach costs operations per iteration, where is the number of nonzero entries of the adjacency matrix. Thus, it is cheaper than the former approach as long as the number of required iterations is less than .

A different conclusion holds in the case where the centrality scores of all edges must be computed. In fact, in this case, the cost is , by relying on the following computation that is based on (13):

  1. Compute and ;

  2. For all such that compute:

    1. ,

    2. ,

    3. .

The overall cost of the above approach is dominated by the cost of step 1, i.e., arithmetic operations. The drawback of this approach is that all the entries of the matrices and must be stored. This can be an issue if takes very large values.

Another issue is the potentially large condition number of the matrix . A way to overcome this difficulty consists in applying a sort of regularization in the inversion of the matrix . This is the subject of the next section.

5 Regularized Kemeny-based centrality score

Let be a regularization parameter and, with the notation of the previous sections, define the regularized Kemeny constant as

where, for the second expression, we used the symmetrized version. Observe that, with respect to the standard definition, we have increased the diagonal entries of the matrix by the quantity . From one hand, this modification reduces the condition number of , on the other hand, it allows to deal with the situations where is singular, for instance, in the case where the graph is not connected.

If the graph is connected, then , where are the eigenvalues of ordered in a non-increasing order, moreover . On the other hand, if is not connected and is formed by two connected components, then since in this case .

Similarly, we may define the regularized Kemeny-based centrality score of the edge

(14)

so that we have

where , , are the eigenvalues of the matrix , ordered in non-increasing order, for . Since (compare the proof of Theorem 2), the above equation implies that . Observe that if is disconnected, then it is formed by two connected components and the matrix is reducible and has two eigenvalues equal to 1 so that . Thus, we may write

(15)

In this case, we have and it turns out that the regularized centrality score of a cut-edge grows as when . Observe also that the quantity in (15) cannot exceed the value . In fact, we have the following result.

Theorem 5

If is a cut-edge, then for the regularized centrality score of (14) we have .