1 Introduction
In network analysis, several measures of the importance of an edge of a graph, having different modellistic meanings and mathematical formulations, have been introduced. For instance, in [2, 13] the communicability between two nodes , of a graph is defined as the th entry in the exponential of the adjacency matrix of . The exponential of a matrix is also at the basis of the definition of importance given in [12]. Other measures based on the computation of matrix functions are introduced in [4], where a parameterized node centrality measure is introduced, and in [3] where directed networks are analyzed. In [11] the idea of considering the variation of the Kemeny constant, when an edge is removed from a graph, is considered.
In this, paper, following [11], we introduce and analyze a new definition of centrality based on a modified variation of the Kemeny constant.
Let
be the transition matrix of a finite irreducible Markov chain, let
be its invariant measure, so that and , where . The Kemeny constant is defined as the average firstpassage time from a predetermined state to a statedrawn randomly according to the probability distribution
. It is a surprising but wellstudied fact that this definition does not depend on [18].Given a connected undirected graph , where is the set of vertices and the set of edges (possibly with weights), denote by the associated adjacency matrix. The Kemeny constant of the graph is defined as , where
is the stochastic matrix
, with , , anddenotes the diagonal matrix having the entries of the vector
on the diagonal. The Kemeny constant gives a global measure of the nonconnectivity of a network [5, 11, 20]. Indeed, if is not connected then the Kemeny constant cannot be defined or, in different words, it takes the value infinity.Following the idea of [11], we may formally define the Kemenybased centrality score of the edge as
i.e., the change of the connectivity of the graph measured by the Kemeny constant, when the edge is removed from the graph itself. This quantity is well defined assuming that also is connected. Recall that an edge such that is disconnected is known as a cutedge in graph theory.
We show that, in matrix form, the value of
can be given in terms of the eigenvalues of the symmetric matrix
and of the eigenvalues of , where is a symmetric correction of rank 2. A drawback of this definition of centrality score is that there exist graphs where is negative for some , an elementary example is shown in Section 3.2. In the literature, this fact is known as the Braess paradox [11], [7]. Its matrix explanation is that the correction is not positive semidefinite.To overcome this drawback we propose a modified centrality measure, which is nonnegative for any graph and for any edge . The underlying idea consists in modifying the correction in such a way that the new correction is a positive semidefinite matrix of rank 1. From the model point of view, this consists in replacing the edge with the loops and . More precisely, the centrality score of is modified as follows
Since the eigenvalues of the matrix are greater than or equal to the corresponding eigenvalues of , then for any and for any graph. This guarantees that the Braess paradox is not encountered.
This definition cannot be applied in the case where is a cutedge, i.e., is not connected; in fact, in this case, the definition would yield . To overcome this drawback, we introduce the concept of regularized centrality score , depending on a regularization parameter . The idea is to replace the Laplacian matrix with the regularized Laplacian matrix in the formulas that give the Kemeny constant. If is not a cutedge, then ; otherwise, if is a cutedge then ; indeed, expressing the Kemeny constant in terms of the eigenvalues of , one sees that it contains a term . For cutedges, the quantity is nonnegative and has a finite limit for ; this suggests the following definition of a filtered Kemenybased centrality score
The modified measure defined in this way is always nonnegative, and seems particularly effective in highlighting bottlenecks in road networks, or socalled weak ties [16] that bridge different clusters.
We provide efficient algorithms implementing the computation of the score either of a single edge, or of all the edges of a graph. The main tools in the algorithm design are the ShermanWoodburyMorrison formula and the Cholesky factorization of the regularized Laplacian matrix .
Our algorithms have been tested both on synthetic graphs and on graphs representing real road networks, in particular, we have considered the maps of Pisa and of the entire Tuscany. From our numerical experiments, reported in the paper, it turned out that this measure is robust, effective, and realistic from the model point of view, moreover, its computation is sufficiently fast even for large road networks. Comparisons with other centrality measures from [13] have been performed. It turns out that our model, unlike the ones based on PageRank and Betweenness of the dual graph, succeeds in detecting bridges on the river Arno and overpasses over the railroad line as important roads in the Pisa road map. The edge betweenness and edge currentflow betweenness are the only two measures (among those considered) that succeed, even though only partially, in highlighting important bottleneck roads. The CPU time required for the computation of this measure is comparable with that of other betweennessbased measures on planar networks of roads. More details concerning applications of the Kemenybased centrality measure to road networks can be found in [1].
The paper is organized as follows. In Section 2 we recall some properties of the Kemeny constant. In Section 3 the Kemenybased centrality measure is introduced and a matrix analysis is performed, while in Section 4 a modified definition is proposed in order to avoid the Braess paradox. The regularized and filtered centrality scores are proposed in Section 5. Section 6 is devoted to computational issues and numerical experiments. Conclusions are drawn in Section 7.
2 The Kemeny constant
Let be the transition matrix of an irreducible finite Markov chain and let be its steady state vector. Denote by the Kemeny constant of . We recall some properties which allow to express the Kemeny constant in terms of the trace of a suitable matrix. Such expressions will be useful in the analysis performed in the next sections.
Lemma 1 ([23])
Let be column vectors with , , . Then, the inverse exists, and
independently of .
By setting , one gets the following corollary.
Corollary 1
Let be a column vector with ; then, exists, and
(1) 
Since is an irreducible stochastic matrix, then it has a simple eigenvalue equal to 1. The Kemeny constant can be expressed by means of the eigenvalues different from 1, according to the following result.
Corollary 2
Let be the spectrum of . Then,
(2) 
Proof
Take a Jordan form with , , and (reordering if necessary). Then, one has
where is upper triangular with . Plugging this expression into (1), we get
3 A centrality measure based on the Kemeny constant
Given a connected undirected graph (possibly weighted) , where denotes the set of vertices and the set of edges (possibly with weights), one can define its Kemeny constant as , with , where is the adjacency matrix of the network, and , . The Kemeny constant gives a global measure of the connectivity of a network; in fact, small values of the constant correspond to highly connected networks, and large values correspond to a low connectivity.
To obtain a relative measure that takes into account the importance of each edge , we can define the Kemenybased centrality score as
(3) 
i.e., the change in obtained by removing the edge . This quantity is well defined assuming that is still connected, that is, is not a cutedge.
Removing one edge corresponds to zeroing out the entries and . This leads to the new adjacency matrix
(4) 
where and are the th and the
th columns of the identity matrix
, respectively. This removal changes the transition matrix into the matrix , where , , that differs from only in rows and since . Hence we have(5) 
where
(6) 
with .
Theorem 1
Proof
Theorem 1 allows us to compute the centrality score of one edge at essentially the cost of applying the matrix to four vectors.
3.1 A symmetrized formulation
Observe that is such that is a symmetric matrix having the same spectrum of . Therefore, in view of Corollary 1 we may write
Moreover, choosing and such that , yields
(8) 
In the above expression, the matrix is real symmetric.
The symmetrization of the matrix can be easily obtained in a similar manner, that is,
(9) 
Thus, we may write , where is a low rank symmetric matrix. This fact enables us to exploit the properties of the eigenvalues of symmetric matrices like the CourantFischer theorem [6, Chapter III].
3.2 Disconnected networks and cutedges
If is reducible, according to our earlier definitions, say, definition (2), we would get , since in this case has at least two eigenvalues equal to . Therefore, one cannot apply the definition of the Kemenybased centrality score. However, we may extend this definition to reducible matrices by means of a continuity argument as follows.
Assume reducible and w.l.o.g. assume , where , , are irreducible stochastic matrices. Clearly, the matrix has eigenvalues , and for . Observe that the perturbed matrix is stochastic and irreducible for any , so that has only one eigenvalue equal to 1. Moreover, in view of the Brauer theorem [9], the remaining eigenvalues of are given by , . Therefore we have
Now consider the matrix where is obtained by removing the edge . Assume that this edge belongs to the block for some and that it is not a cutedge. That is, the block obtained after removing the edge is still irreducible. Denote , the eigenvalues of so that , and for . By applying once again the Brauer theorem we find that has only one eigenvalue , and the remaining eigenvalues are , for . Therefore we have
so that
whence
Now recall that the removed entries and in belong to the block so that the eigenvalues of different from the eigenvalues of are those of the block , except for the eigenvalue 1. Therefore we have
From the above arguments it is natural to extend the definition of centrality score of an edge to the case of reducible matrices as follows.
Definition 1
Let , , be such that are irreducible stochastic matrices. Let belong to the set of indices of the block , and assume that the edge is not a cutedge. The Kemenybased centrality of the edge is defined as
where is the stochastic matrix obtained from by removing the edge , according to equation (5).
If is reducible, i.e., if the graph is disconnected, then we can identify its connected components, locate the block containinmg the edge and apply the above definition in order to evaluate .
If is a cutedge, then clearly is reducible so that , consequently . Several graphtheoretical algorithms exist in literature to compute cutedges in a graph in time [22], and update the connected components of a graph after removing edges [21]. However, we prefer to deal with this issue by means of the regularization technique that we will describe later on.
4 A nonnegative Kemenybased centrality score
Intuitively, one expects that the connectivity of a graph should not increase if an edge is removed from the graph. Therefore, if the Kemeny constant properly describes the nonconnectivity of a graph, then it should not decrease if an edge is removed. In terms of definition of centrality score given in (3), we expect that . Unfortunately, it is not so.
In fact, there are cases where the Kemeny constant of a graph can decrease if an edge is removed, like in the graph with edges , shown in Figure 1, on the left. Its Kemeny constant is . Removing the edge , we get the graph on the right, which has a smaller Kemeny constant, i.e., . That is, the centrality score of the edge in this graph is negative. This fact is known in the literature as the Braess paradox [11], [7].
In order to overcome this odd behavior of the model, where the measure
can take negative values, we propose a simple modification which also makes the computation of an easier task.Observe that removing the edge from the graph consists in performing a correction to the adjacency matrix of rank 2 in order to obtain the new matrix , compare with (4). This correction is such that the vector differs from the vector in the components and . On the other hand, defining in a different way, by means of the following expression
(10) 
has the effect of zeroing the entries and in , and of adding to the diagonal entries in position and . In terms of graph, this correction consists in removing the edge and adding the two loops and with the same weight .
The advantage of this correction is that the vectors and satisfy the identity since . This property allows us to prove that the centrality score, defined this way, always takes nonnegative values. In order to prove this property we need to recall the following classical result that is a consequence of the CourantFischer minimax theorem [6, Chapter III].
Lemma 2
Let be real symmetric matrices such that , and let , be their eigenvalues, respectively, ordered in nondecreasing order. Then , for .
We are ready to prove the following result.
Theorem 2
Let be the adjacency matrix of an undirected graph, let be such that the edge is not a cutedge, and let be the adjacency matrix defined in (10). Then for the centrality score defined as we have , where , , , .
Proof
Write in terms of the symmetrized formulation according to (8) and (9), and get
(11) 
where and , are the eigenvalues, sorted in nonincreasing order, of the symmetric matrices
respectively. On the other hand, since and , we have . The matrix has eigenvalues equal to 0 and one eigenvalue equal to . Applying Lemma 2 with and yields the inequality . This implies that in view of (11).
Observe that can be interpreted as the incremental ratio for the increment at of the function , for , where , , . That is, for . An interesting question is to evaluate the derivative of at . We have the following result
Theorem 3
Under the assumptions of Theorem 2, let , for , where , , , . Let , be the eigenvalues of , where . Then .
Proof
Denote the eigenvalues of , where . We have
Since (compare with the proof of Theorem 2), taking the limit for yields
A similar inequality can be proved for . Observe that the upper bound to given in the above theorem coincides with the value up to within a constant factor independent of and . This value depends on the out degree of node and of node independently of the topology of the graph.
Computing the value of defined in (10) is cheaper than computing the quantity defined in (4). To this regard, we have the following
Theorem 4
Proof
By using symmetrization, we have . On the other hand, , so that . The expression for follows from the ShermanWoodburyMorrison identity.
The above result can be used to obtain an effective expression for computing . To this end, rewrite as
so that
Moreover, from the ShermanWoodburyMorrison formula we have
Whence in view of Theorem 4 we obtain
From the above result we obtain the following representation of
(12)  
Observe that and in (12) can be rewritten as
(13)  
Another observation is that the matrix is positive definite since it is invertible and is the sum of two semidefinite matrices. Therefore, it admits the Cholesky factorization .
The major computational effort in computing by means of (12) consists in solving the system . If one has to compute the centrality score of a single edge , then two strategies can be designed for this task. A first possibility consists in computing the Cholesky factorization of and solving the two triangular systems. This approach costs arithmetic operations, as the dominating cost is the one of the Cholesky factorization. A second possibility consists in applying an iterative method for solving the linear system with matrix , that exploits the low cost of the matrixvector product, say, Richardson iteration or preconditioned conjugate gradient method. This approach costs operations per iteration, where is the number of nonzero entries of the adjacency matrix. Thus, it is cheaper than the former approach as long as the number of required iterations is less than .
A different conclusion holds in the case where the centrality scores of all edges must be computed. In fact, in this case, the cost is , by relying on the following computation that is based on (13):

Compute and ;

For all such that compute:

,

,

.

The overall cost of the above approach is dominated by the cost of step 1, i.e., arithmetic operations. The drawback of this approach is that all the entries of the matrices and must be stored. This can be an issue if takes very large values.
Another issue is the potentially large condition number of the matrix . A way to overcome this difficulty consists in applying a sort of regularization in the inversion of the matrix . This is the subject of the next section.
5 Regularized Kemenybased centrality score
Let be a regularization parameter and, with the notation of the previous sections, define the regularized Kemeny constant as
where, for the second expression, we used the symmetrized version. Observe that, with respect to the standard definition, we have increased the diagonal entries of the matrix by the quantity . From one hand, this modification reduces the condition number of , on the other hand, it allows to deal with the situations where is singular, for instance, in the case where the graph is not connected.
If the graph is connected, then , where are the eigenvalues of ordered in a nonincreasing order, moreover . On the other hand, if is not connected and is formed by two connected components, then since in this case .
Similarly, we may define the regularized Kemenybased centrality score of the edge
(14) 
so that we have
where , , are the eigenvalues of the matrix , ordered in nonincreasing order, for . Since (compare the proof of Theorem 2), the above equation implies that . Observe that if is disconnected, then it is formed by two connected components and the matrix is reducible and has two eigenvalues equal to 1 so that . Thus, we may write
(15) 
In this case, we have and it turns out that the regularized centrality score of a cutedge grows as when . Observe also that the quantity in (15) cannot exceed the value . In fact, we have the following result.
Theorem 5
If is a cutedge, then for the regularized centrality score of (14) we have .