CFCC-maximization
None
view repo
Current flow closeness centrality (CFCC) has a better discriminating ability than the ordinary closeness centrality based on shortest paths. In this paper, we extend the notion of CFCC to a group of vertices in a weighted graph. For a graph with n vertices and m edges, the CFCC C(S) for a vertex group S is equal to the ratio of n to the sum of effective resistances from S to all other vertices. We then study the problem of finding a group S^* of k vertices, so that the CFCC C(S^*) is maximized. We alternatively solve this problem by minimizing the reciprocal of C(S^*). We show that the problem is NP-hard, and prove that the objective function is monotone and supermodular. We propose two greedy algorithms with provable approximation guarantees. The first is a deterministic algorithm with an approximation factor (1-1/e) and O(n^3) running time; while the second is a randomized algorithm with a (1-1/e-ϵ)-approximation and O (k mϵ^-2) running time for any small ϵ>0, where the O (·) notation hides the poly factors. Extensive experiments on models and real networks demonstrate that our algorithms are effective and efficient, with the second algorithm being scalable to massive networks with more than a million vertices.
READ FULL TEXT VIEW PDFNone
A fundamental problem in network science and graph mining is to identify crucial vertices [LM12, LCR16]. It is an important tool in network analysis and found numerous applications in various areas [New10]. The first step of finding central vertices is to define suitable indices measuring relative importance of vertices. Over the past decades, many centrality measures were introduced to characterize and analyze the roles of vertices in networks [WS03, BV14, BK15, BDFMR16]. Among them, a popular one is closeness centrality [Bav48, Bav50]
: the closeness of a vertex is the reciprocal of the sum of shortest path distances between it and all other vertices. However, this metric considers only the shortest paths, and more importantly, neglects contributions from other paths. Therefore it can produce some odd effects, or even counterintuitive results
[BWLM16]. To avoid this shortcoming, Brandes and Fleischer presented current flow closeness centrality [BF05] based on electrical networks [DS84] , which takes into account contributions from all paths between vertices. Current flow based closeness has been shown to better discriminate vertices than its traditional counterparts [BWLM16].While most previous works focus on measures and algorithms for the importance of individual vertices in networks [WS03, LCR16], the problem of determining a group of most important vertices arises frequently in data mining and graph applications. For example, in social networks, retailers may want to choose vertices as promoters of product, such that the number of the potentially influenced customers is maximized [KKT03]. Another example is P2P networks, where one wants to place resources on a fixed number of peers so they are easily accessed by others [GMS06]. In order to measure the importance of a group of vertices, Everett and Borgatti [EB99]
extended the idea of individual centrality to group centrality, and introduced the concepts of group centrality, for example, group closeness. Recently, some algorithms have been developed to compute or estimate group closeness
[ZLTG14, CWW16, BGM18]. However, similar to the case of individual vertices, these notions of group centrality also disregard contributions from paths that are not shortest.In this paper, we extend current flow closeness of individual vertices [BWLM16] by proposing current flow closeness centrality (CFCC) for group of vertices. In a graph with vertices and edges, the CFCC of a vertex group is equal to the ratio of to the sum of effective resistances between and all vertices in . We then consider the optimization problem: how can we find a group of vertices so as to maximize . We solve this problem by considering an equivalent problem of minimizing the reciprocal of . We show that the problem is NP-hard in Section 4, but also prove that the problem is an instance of supermodular set function optimization with cardinality constraint in Section 5. The latter allows us to devise greedy algorithms to solve this problem, leading to two greedy algorithms with provable approximation guarantees:
A deterministic algorithm with a approximation factor and running time (Section 6);
A randomized algorithm with a -approximation factor and running time^{1}^{1}1We use the notation to hide the factors. for any small (Section 7).
A key ingredient of our second algorithm is nearly linear time solvers for Laplacians and symmetric, diagonally dominant, M-matrices (SDDM) [ST14, CKM14], which has been used in various optimization problems on graphs [DS08, KMP12, MP13].
We perform extensive experiments on some networks to evaluate our algorithm, and some of their results are in Section 8. Our code is available on GitHub at https://github.com/lchc/CFCC-maximization. These results show that both algorithms are effective. Moreover, the second algorithm is efficient and is scalable to large networks with more than a million vertices.
There exist various measures for centrality of a group of vertices, based on graph structure or dynamic processes, such as betweenness [DEPZ09, FS11, Yos14, MTU16], absorbing random-walk centrality [LYHC14, MMG15, ZLX17], and grounding centrality [PS14, CHBP17]. Since the criterion for importance of a vertex group is application dependent [GTLY14], many previous works focus on selecting (or deleting) a group of vertices (for some given
) in order to optimize related quantities. These quantities are often measures of vertex group importance motivated by the applications, including minimizing the leading eigenvalue of adjacency matrix for vertex immunization
[TPT10, CTP16], minimizing the mean steady-state variance for first-order leader-follower noisy consensus dynamics
[PB10, CP11], maximizing average distance for identifying structural hole spanners [RLXL15, XRL17], and others.Previous works on closeness centrality and related algorithms are most directly related to our focus on the group closeness centrality in this paper. The closeness centrality for an individual vertex was proposed [Bav48] and formalized [Bav50] by Bavelas. For a given vertex, its closeness centrality is defined as the reciprocal of the sum of shortest path distances of the vertex to all the other vertices. Everett and Borgatti [EB99] extended the individual closeness centrality to group closeness centrality, which measures how close a vertex group is to all other vertices. For a graph with vertices and edges, exactly computing the closeness centrality of a group of vertices involves calculating all-pairwise shortest path length, the time complexity of the state-of-the-art algorithm [Joh] for which is . To reduce the computation complexity, various approximation algorithms were developed. A greedy algorithm with approximation ratio was devised [CWW16], and a sampling algorithm that scales better to large networks, but without approximation guarantee was also proposed in the same paper. Very recently, new techniques [BGM18] have been developed to speed up the greedy algorithm in [CWW16] while preserving its theoretical guarantees.
Conventional closeness centrality is based on the shortest paths, omitting the contributions from other paths. In order to overcome this drawback, Brandes and Fleischer introduced current flow closeness centrality for an individual vertex [BF05], which essentially considers all paths between vertices, but still gives large weight to short paths. Our investigation can be viewed as combining this line of current based centrality measures with the study of selecting groups of vertices. For the former, a subset of the authors of this paper (Li and Zhang) recently demonstrated that current flow centrality measures for single edges can be computed provably efficiently [LZ18]. Our approximation algorithm in Section 7 is directly motivated by that routine.
In this section, we briefly introduce some useful notations and tools for the convenience of description of our problem and algorithms.
We use normal lowercase letters like to denote scalars in , normal uppercase letters like to denote sets, bold lowercase letters like
to denote vectors, and bold uppercase letters like
to denote matrices. We write to denote the entry of vector and to denote entry of matrix . We also write to denote the row of and to denote the column of .We write sets in matrix subscripts to denote submatrices. For example, denotes the submatrix of with row indices in and column indices in . To simplify notation, we also write to denote the submatrix of obtained by removing the row and column of . For example, for an matrix , denotes the submatrix .
Note that the precedence of matrix subscripts is the lowest. Thus, denotes the inverse of instead of a submatrix of .
For two matrices and , we write to denote that is positive semidefinite, i.e., holds for every real vector .
We use to denote the standard basis vector of appropriate dimension, and to denote the indicator vector of .
We write to denote a positively weighted undirected graph with vertices, edges, and edge weight function . The Laplacian matrix of is defined as if , if , and otherwise, where is the weighted degree of and means . Let and denote, respectively, the maximum weight and minimum weight among all edges. If we orient each edge of arbitrarily, we can also write it’s Laplacian as , where is the signed edge-vertex incidence matrix defined by if is ’s head, if is ’s tail, and otherwise, and is a diagonal matrix with . It is not hard to show that quadratic forms of can be written as which immediately implies that is positive semidefinite, and only has one zero eigenvalue if is a connected graph.
The following fact shows that submatrices of Laplacians are always positive definite and inverse-positive.
Let be the Laplacian of a connected graph and let be a nonnegative, diagonal matrix with at least one nonzero entry. Then, is positive definite, and every entry of is positive.
Let be eigenvalues of of a connected graph , and
be the corresponding orthonormal eigenvectors. Then we can decompose
as and define its pseudoinverse as .It is not hard to verify that if and are Laplacians of connected graphs supported on the same vertex set, then implies .
The pseudoinverse of Laplacian matrix can be used to define effective resistance between any pair of vertices [KR93].
For a connected graph with Laplacian matrix , the effective resistance between vertices and is defined as
The effective resistance between two vertices can also be expressed in term of the diagonal elements of the inverse for submatrices of .
The current flow closeness centrality was proposed in [BF05]. It is based on the assumption that information spreads efficiently like an electrical current.
To define current flow closeness, we treat the graph as a resistor network via replacing every edge by a resistor with resistance . Let denote the voltage of when a unit current enters the network at and leaves it at .
The current flow closeness of a vertex is defined as
It has been proved [BF05] that the current flow closeness of vertex equals the ratio of to the sum of effective resistances between and other vertices.
.
Actually, current flow closeness centrality is equivalent to information centrality [SZ89].
We now give the definitions for monotone and supermodular set functions. For simplicity, we write to denote and to denote .
A set function is monotone if holds for all .
A set function is supermodular if holds for all and .
We follow the idea of [BF05] to define current flow closeness centrality (CFCC) of a group of vertices.
To define current flow closeness centrality for a vertex set , we treat the graph as a resistor network in which all vertices in are grounded. Thus, vertices in always have voltage . For a vertex , let be the voltage of when a unit current enters the network at and leaves it at (i.e. the ground). Then, we define the current flow closeness of as follows.
Let be a connected weighted graph. The current flow closeness centrality of a vertex group is defined as
Note that there are different variants of the definition of CFCC for a vertex group. For example, we can use as the measure of CFCC for a vertex set . Definition 3.1 adopts the standard form as the classic closeness centrality [CWW16].
We next show that is in fact equal to the ratio of to a sum of effective resistances as in Fact 2.5.
Let be a fixed vertex. Suppose there is a unit current enters the network at and leaves it at . Let be a vector of voltages at vertices. By Kirchhoff’s Current Law and Ohm’s Law, we have where denotes the amount of current flowing out of . Since vertices in all have voltage , we can restrict this equation to vertices in as , which leads to This gives the expression of voltage at as Now we can write the CFCC of as
Note that the diagonal entry of is exactly the effective resistance between vertex and vertex set [CP11], with for any . Then we have the following relation governing and .
.
Being able to define CFCC of a vertex set raises the problem of maximizing current flow closeness subject to a cardinality constraint, which we state below.
Given a connected graph with vertices, edges, and edge weight function and an integer , find a vertex group such that the CFCC is maximized, that is
In this section, we prove that Problem 1 is NP-hard. We will give a reduction from vertex cover on 3-regular graphs (graphs whose vertices all have degree 3), which is an NP-complete problem [FHJ98]. The decision version of this problem is stated below.
Given a connected 3-regular graph and an integer , decide whether or not there is a vertex set such that and is a vertex cover of (i.e. every edge in is incident with at least one vertex in ).
An instance of this problem is denoted by VC3.
We then give the decision version of Problem 1.
Given a connected graph , an integer , and a real number , decide whether or not there is a vertex set such that and .
An instance of this problem is denoted by CFCMD.
To give the reduction, we will need the following lemma.
Let be a connected 3-regular graph with all edge weights being (i.e. for all ). Let be a nonempty vertex set, and . Then, and the equality holds if and only if is a vertex cover of .
We first show that if is a vertex cover of then . When is a vertex cover, is an independent set. Thus, is a diagonal matrix with all diagonal entries being . So we have
We then show that if is not a vertex cover of then . When is not a vertex cover, is not an independent set. Thus, is a block diagonal matrix, with each block corresponding to a connected component of , the induced graph of on . Let be a connected component of such that . Then, the block of corresponding to is . For a vertex , let the column of be . Then, we can write into block form as where . By blockwise matrix inversion we have Since is positive definite, we have and hence . Since is a connected component, is not a zero vector, which coupled with the fact that is positive definite gives . Thus, . Since this holds for all , we have . Also, since can be any connected component of with at least two vertices, and a block of an isolate vertex in contributes a to , we have for any which is not a vertex cover of which implies . ∎
The following theorem then follows by Lemma 4.1.
Maximizing current flow closeness subject to a cardinality constraint is NP-hard.
We give a polynomial reduction from instances of VC3 to instances of CFCMD. For a connected 3-regular graph with vertices, we construct a weighted graph with the same vertex set and edge set and an edge weight function mapping all edges to weight . Then, we construct a reduction as
By Lemma 4.1, is a polynomial reduction from VC3 to CFCMD, which implies that CFCM is NP-hard. ∎
In this section, we prove that the reciprocal of current flow group closeness, i.e., , is a monotone supermodular function. Our proof uses the following lemma, which shows that is entrywise supermodular.
Let be an arbitrary pair of vertices. Then, the entry is a monotone supermodular function. Namely, for vertices and nonempty vertex sets such that , and
To prove Lemma 5.1, we first define a linear relaxation of as
(1) |
We remark the intuition behind this relaxation . Let denote the indices of entries of equal to one, and let denote the indices of entries of less than one. Then, by the definition in (1), we can write into a block diagonal matrix as
where is itself a diagonal matrix. This means that if for some nonempty vertex set , the following statement holds:
The condition that every entry of is in coupled with Fact 2.1 also implies that all submatrices of are positive definite and inverse-positive.
Now for vertices and nonempty vertex set such that , we can write the marginal gain of a vertex as
(2) |
We can further write the matrix on the rhs of (2) as an integral by
(3) |
where the second equality follows by the identity
for any invertible matrix
.To prove Lemma 5.1, we will also need the following lemma, which shows the entrywise monoticity of .
For , the following statement holds for any vertices and nonempty vertex sets such that :
For simplicity, we let and . We also write , , and . Due to the block diagonal structures of and , we have
(4) |
and
(5) |
Since and agree on entries with indices in , we can write the submatrix of in block form as
By blockwise matrix inversion, we have
where the second equality follows by negating both and . By definition the matrix is entrywise nonnegative. By Fact 2.1, every entry of and is also nonnegative. Thus, the matrix
is entrywise nonnegative, which coupled with (4) and (5), implies . ∎
The following theorem follows by Lemma 5.1.
The reciprocal of current flow group centrality, i.e., , is a monotone supermodular function.
Let be vertex sets and be a vertex.
For monotonicity, we have
where the first inequality follows by the fact that is entrywise nonnegative, and the second inequality follows from the entrywise monotinicity of .
For supermodularity, we have
where the first inequality follows from the entrywise monotonicity of , and the second inequality follows from the entrywise supermodularity of . ∎
We note that [CP11] has previously proved that is monotone and supermodular by using the connection between effective resistance and commute time for random walks. However, our proof is fully algebraic. Moreover, we present a more general result that is entrywise supermodular.
Theorem 5.4 indicates that one can obtain a -approximation to the optimum by a simple greedy algorithm, by picking the vertex with the maximum marginal gain each time [NWF78]. However, since computing involves matrix inversions, a naive implementation of this greedy algorithm will take time, assuming that one matrix inversion runs in time. We will show in the next section how to implement this greedy algorithm in time using blockwise matrix inversion.
We now consider how to accelerate the naive greedy algorithm. Suppose that after the step, the algorithm has selected a set containing vertices. We next compute the marginal gain of each vertex .
For a vertex , let denote the column of the submatrix . Then we write in block form as where . By blockwise matrix inversion, we have
(7) |
where . Then the marginal gain of can be further expressed as
where the second equality and the fourth equality follow by (7), while the third equality follows by the cyclicity of trace.
By (7), we can also update the inverse upon a vertex by
At the first step, we need to pick a vertex with minimum , which can be done by computing for all using the relation [BF13]
We give the -time algorithm as follows.
The performance of ExactGreedy is characterized in the following theorem.
The algorithm takes an undirected positive weighted graph with associated Laplacian and an integer , and returns a vertex set with . The algorithm runs in time . The vertex set satisfies
where and .
The running time is easy to verify. We only need to prove the approximation ratio.
By supermodularity, for any
which implies