Centrality metrics refer to indicators identifying the varying importance of nodes in complex networks [Lü et al.2016], which have become a powerful tool in network analysis and found wide applications in network science [Newman2010]. Over the past years, a great number of centrality indices and corresponding algorithms have been proposed to analyze and understand the roles of nodes in networks [White and Smyth2003, Boldi and Vigna2014]
. Among various centrality indices, betweennees centrality and closeness centrality are probably the two most frequently used ones, especially in social network analysis. However, both indicators only consider the shortest paths, excluding the contributions from other longer paths. In order to overcome the drawback of these two measures, current flow closeness centrality[Brandes and Fleischer2005, Newman2005] was introduced and proved to be exactly the information centrality [Stephenson and Zelen1989], which counts all possible paths between nodes and has a better discriminating power than betweennees centrality [Newman2005] and closeness centrality [Bergamini et al.2016].
It is recognized that centrality measures have proved of great significance in complex networks. Having high centrality can have positive consequences on the node itself. In this paper, we consider the problem of adding a given number of edges incident to a designated node so as to maximize the centrality of . Our main motivation or justification for studying this problem is that it has several application scenarios, including airport networks [Ishakian et al.2012], recommendation systems [Parotsidis et al.2016], among others. For example, in airport networks, a node (airport) has the incentive to improve as much as possible its centrality (transportation capacity) by adding edges (directing flights) connecting itself and other nodes (airports) [Ishakian et al.2012]. Another example is the link recommendation problem of recommending to a user a given number of links from a set of candidate inexistent links incident to in order to minimize the shortest distance from to other nodes [Parotsidis et al.2016].
The problem of maximizing the centrality of a specific target node through adding edges incident to it has been widely studied. For examples, some authors have studied the problem of creating edges linked to a node so that the centrality value for with respect to concerned centrality measures is maximized, e.g., betweenness centrality [Crescenzi et al.2015, D’Angelo et al.2016, Crescenzi et al.2016, Hoffmann et al.2018] and closeness centrality [Crescenzi et al.2015, Hoffmann et al.2018]. Similar optimization problems for a predefined node were also addressed for other node centrality metrics, including average shortest distance between and remaining nodes [Meyerson and Tagiku2009, Parotsidis et al.2016], largest distance from to other nodes [Demaine and Zadimoghaddam2010], PageRank [Avrachenkov and Litvak2006, Olsen2010], and the number of different paths containing [Ishakian et al.2012]. However, previous works do not consider improving information centrality of a node by adding new edges linked to it, despite the fact that it can better distinguish different nodes, compared with betweennees [Newman2005] and closeness centrality [Bergamini et al.2016].
In this paper, we study the following problem: Given a graph with nodes and edges, how to create new edges incident to a designated node , so that the information centrality of is maximized. Since equals the reciprocal of the sum of resistance distance between and all nodes, we reduce the problem to minimizing by introducing edges connecting . We demonstrate that the optimization function is monotone and supermodular. To minimize resistance distance , we present two greedy approximation algorithms by iteratively introducing edges one by one. The former is a -approximation algorithm with time complexity, while the latter is a -approximation algorithm with time complexity, where the notation hides factors. We test the performance of our algorithms on several model and real networks, which substantially increase information centrality score of a given node and outperform several other adding edge strategies.
Consider a connected undirected weighted network where is the set of nodes, is the set of edges, and is the edge weight function. We use to denote the maximum edge weight. Let denote the number of nodes and denote the number of edges. For a pair of adjacent nodes and , we write to denote . The Laplacian matrix of is the symmetric matrix , where is the weighted adjacency matrix of the graph and is the degree diagonal matrix.
Let denote the
th standard basis vector, and. We fix an arbitrary orientation for all edges in . For each edge , we define , where and are head and tail of , respectively. It is easy to verify that , where is the Laplacian of . is singular and positive semidefinite. Its pseudoinverse is , where is the matrix with all entries being ones.
For network , the resistance distance [Klein and Randić1993] between two nodes is . The resistance distance of a node is the sum of resistance distances between and all nodes in , that is, , which can be expressed in terms of the entries of as [Bozzo and Franceschet2013]
Let denote the submatrix of Laplacian , which is obtained from by deleting the row and column corresponding to node . For a connected graph , is invertible for any node , and the resistance distance between and another node is equal to [Izmailian et al.2013]. Thus, we have
The resistance distance can be used as a measure of the efficiency for node in transmitting information to other nodes, and is closely related to information centrality introduced by Stephenson and Zelen to measure the importance of nodes in social networks [Stephenson and Zelen1989]. The information transmitted between and is defined as
where . The information centrality of node
is the harmonic mean ofover all nodes [Stephenson and Zelen1989].
For a connected graph , the information centrality of a node is defined as
It was shown [Brandes and Fleischer2005] that
We continue to introduce some useful notations and tools for the convenience of description for our algorithms, including -approximation and supermodular function.
Let be two nonnegative scalars. We say is an -approximation [Peng and Spielman2014] of if . Hereafter, we use to represent that is an -approximation of .
Let be a finite set, and be the set of all subsets of . Let be a set function on . For any subsets and any element , we say function is supermodular if it satisfies . A function is submodular if is supermodular. A set function is called monotone decreasing if for any subsets , holds.
3 Problem Formulation
For a connected undirected weighted network , given a set of weighted edges not in , we use to denote the network augmented by adding the edges in to , i.e. , where is the new weight function. Let denote the Laplacian matrix for . Note that the information centrality of a node depends on the graph topology. If we augment a graph by adding a set of edges , the information centrality of a node will change. Moreover, adding edges incident to some node can only increase its information centrality [Doyle and Snell1984].
Assume that there is a set of nonexistent edges incident to a particular node , each with a given weight. We denote this candidate edge set as . Consider choosing a subset of edges from the candidate set to augment the network so that the information centrality of node is maximized. Let denote the information centrality of the node in augmented network. We define the following set function optimization problem:
Since the information centrality of a node is proportional to the reciprocal of , the optimization problem (4) is equivalent to the following problem:
where is the resistance distance of in the augmented network . Without ambiguity, we take to replace for simplicity.
4 Supermodularity of Objective Function
Let denote all subsets of . Then the resistance distance of node in the augmented network can be represented as a set function . To provide effective algorithms for the above-defined problems, we next prove that the resistance distance of is a supermodular function.
Rayleigh’s monotonicity law [Doyle and Snell1984] shows that the resistance distance between any pair of nodes can only decrease when edges are added. Then, we have the following theorem.
is a monotonically decreasing function of the set of edges . That is, for any subsets ,
We then prove the supermodularity of the objective function .
is supermodular. For any set and any edge ,
Proof. Suppose that edge connects two nodes and , then , where is a square matrix with the th diagonal entry being one, and all other entries being zeros. By (2), it suffices to prove that
Since is a subset of , , where is a nonnegative diagonal matrix. For simplicity, in the following proof, we use to denote matrix . Then, we only need to prove
Define function , , as
Then, the above inequality holds if takes the minimum value at . We next show that is an increasing function by proving . Using the matrix derivative formula
we can differentiate function as
Let , and let be a nonnegative diagonal matrix with exactly one positive diagonal entry and all other entries being zeros. We now prove that for . Using Sherman-Morrison formula [Meyer1973], we have
Since is an M-matrix, every entry of is positive [Plemmons1977], it is the same with every entry of . In addition, the denominator is also positive, because is positive definite. Therefore, is a positive matrix, the entries of which are all greater than zero.
By repeatedly applying the above process, we conclude that is a positive matrix. Thus,
which completes the proof.
5 Simple Greedy Algorithm
Theorems 4.1 and 4.2 indicate that the objective function (5) is a monotone and supermodular. Thus, a simple greedy algorithm is sufficient to approximate problem (5) with provable optimality bounds. In the greedy algorithm, the augmented edge set is initially empty. Then edges are iteratively added to the augmented edge set from the set of candidate edges. At each iteration, an edge in the candidate edge set is selected to maximize . The algorithm terminates when .
According to (1), the effective resistance is equal to . A naive algorithm requires time complexity, which is prohibitively expense. Below we show that the computation cost can be reduced to by using Sherman-Morrison formula [Meyer1973].
For a connected weighted graph with weighted Laplacian matrix , let be a nonexistent edge with given weight connecting node . Then,
Let be a connected weighted graph with weighted Laplacian matrix . Let be a candidate edge with given weight incident to node . Then,
Lemma 5.2 yields a simple greedy algorithm , as outlined in Algorithm 1. The first step of this algorithm is to compute the pseudoinverse of , the time complexity of which is time. Then this algorithm works in rounds, each involving operations of computations and updates with time complexity . Thus, the total running time of Algorithm 1 is .
6 Fast Greedy Algorithm
Although Algorithm 1 is faster than the naive algorithm, it is still computationally infeasible for large networks, since it involves the computation of the pseudoinverse for . In this section, in order to avoid inverting the matrix , we give an efficient approximation algorithm, which achieves a approximation factor of optimal solution to problem (5) in time .
We first consider the denominator in (6). Assume that the new added edge connects nodes and . Note that the term in the denominator is in fact the resistance distance between and in the network excluding . It can be computed by the following approximation algorithm [Spielman and Srivastava2011].
Let be a weighted connected graph. There is an algorithm that
returns an estimate
that returns an estimateof for all in time. With probability at least , holds for all .
For the numerator of (6), it includes two terms, and . The first term can be calculated by . The second term is the trace of an implicit matrix which can be approximated by Hutchinson’s Monte-Carlo method [Hutchinson1989]. By generating independent random vectors (i.e., independent Bernoulli entries), can be used to estimate the trace of matrix . Since
, by the law of large numbers,should be close to when is large. The following lemma [Avron and Toledo2011] provides a good estimation of .
Let be a positive semidefinite matrix with rank . Let be independent random vectors. Let be scalars such that and . For any , the following statement holds with probability at least :
Thus, we have reduced the estimation of the numerator of (6) to the calculation of the quadratic form of . If we directly compute the quadratic form, we must first evaluate , the time complexity is high. To avoid inverting , we will utilize the nearly-linear time solver for Laplacian systems from [Kyng and Sachdeva2016], whose performance can be characterized in the following lemma.
The algorithm takes a Laplacian matrix of a graph with nodes and edges, a vector and a scalar as input, and returns a vector such that with probability the following statement holds:
where . The algorithm runs in expected time .
6.2 Fast Algorithm for Objective Function
Algorithm 3 works in rounds (Lines 2-6). In every round, the call of VReffComp and updates take time . Then, the total running time of Algorithm 3 is . The following theorem shows that the output of Algorithm 3 gives a approximate solution to problem (5).
For any , the set returned by the greedy algorithm above satisfies
where is the optimal solution to problem (5), i.e.,
We omit the proof, since it is similar to that in [Badanidiyuru and Vondrák2014].
In this section, we experimentally evaluate the effectiveness and efficiency of our two greedy algorithms on some model and real networks. All algorithms in our experiments are implemented in Julia. In our algorithms, we use the LaplSolve [Kyng and Sachdeva2016], the implementation (in Julia) of which is available on website111https://github.com/danspielman/Laplacians.jl. All experiments were conducted on a machine with 4.2 GHz Intel i7-7700 CPU and 32G RAM.
We execute our experiments on two popular model networks, Barabási-Albert (BA) network and Watts–Strogatz (WS) network, and a large connection of realistic networks from KONECT [Kunegis2013] and SNAP222https://snap.stanford.edu. Table 1 provides the information of these networks, where real-world networks are shown in increasing size of the number of nodes in original networks.
|Zachary karate club||34||78||34||78|
7.1 Effectiveness of Greedy Algorithms
To show the effectiveness of our algorithms, we compare the results of our algorithms with the optimum solutions on two small model networks, BA network and WS network, and two small real-world networks, Zachary karate club network and Windsufers contact network. Since these networks are small, we are able to compute the optimal edge set.
For each network, we randomly choose 20 target nodes. For each target node , the candidate edge set is composed of all nonexistent edges incident to it with unit weight . And for each designated , we add edges linked to and other non-neighboring nodes of . We then compute the average information centrality of the 20 target nodes for each . Also, we compute the solutions for the random scheme, by adding edges from randomly selected non-neighboring nodes. The results are reported in Figure 1. We observe that there is little difference between the solutions of our greedy algorithms and the optimal solutions, since their approximation ratio is always greater than 0.98, which is far better than the theoretical guarantees. Moreover, our greedy schemes outperform the random scheme in these four networks.
To further demonstrate effectiveness of our algorithms, we compare the results of our methods with the random scheme and other two baseline schemes, Top-degree and Top-cent, on four other real-world networks. In Top-degree scheme, the added edges are simply the edges connecting target node and its nonadjacent nodes with the highest degree in the original network; while in Top-cent scheme, the added edges are simply those edges connecting target node and its nonadjacent nodes with the largest information centrality in the original network.
Since the results may vary depending on the initial information centrality of the target node , for each of the four real networks, we select 10 different target nodes at random. For each target node, we first compute its original information centrality and increase it by adding up to new edges, using our two greedy algorithms and the three baselines. Then, we compute and record the information centrality of the target node after insertion of every edge. Finally, we compute the average information centrality of all the 10 target nodes for each , which is plotted in Figure 2. We observe that for all the four real-world networks our greedy algorithms outperform the three baselines.
7.2 Efficiency Comparison of Greedy Algorithms
Although both of our greedy algorithms are effective, we will show that their efficiency greatly differs. To this end, we compare the efficiency of the greedy algorithms on several real-world networks. For each network, we choose stochastically 20 target nodes, for each of which, we create new edges incident to it to maximize its information centrality according to Algorithms 1 and 3. We compute the average information centrality of 10 target nodes for each network and record the average running times. In Table 2 we provide the results of average information centrality and average running time of our greedy algorithms. We observe that ApproxiSM algorithm are faster than ExactSM algorithm, especially for large networks, while their final information centrality score are close. More interestingly, ApproxiSM applies to massive networks. For example, for com-DBLP and roadNet-TX networks, ApproxiSM computes their information centrality in half an hour, while ApproxiSM fails due to its high time complexity.
|Network||Time (seconds)||Information centrality|
In this paper, we considered the problem of maximizing the information centrality of a designated node by adding new edges incident to it. This problem is equivalent to minimizing the resistance distance of node . We proposed two approximation algorithms for computing when edges are repeatedly inserted in a greedy way. The first one gives a approximation of the optimum in time . While the second one returns a approximation in time . Since the considered problem has never addressed before, we have no other algorithms to compare with, but compare our algorithms with potential alternative algorithms. Extensive experimental results on model and realistic networks show that our algorithms can often compute an approximate optimal solution. Particularly, our second algorithm can achieve a good approximate solution very quickly, making it applicable to massive networks.
- [Avrachenkov and Litvak2006] K. Avrachenkov and N. Litvak. The effect of new links on Google PageRank. Stochastic Models, 22(2):319–331, 2006.
- [Avron and Toledo2011] H. Avron and S. Toledo. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of ACM, 58(2):8:1–8:34, 2011.
- [Badanidiyuru and Vondrák2014] A. Badanidiyuru and J. Vondrák. Fast algorithms for maximizing submodular functions. In SODA, pages 1497–1514. SIAM, 2014.
- [Bergamini et al.2016] E. Bergamini, M. Wegner, D. Lukarski, and H. Meyerhenke. Estimating current-flow closeness centrality with a multigrid Laplacian solver. In CSC, pages 1–12, 2016.
- [Boldi and Vigna2014] P. Boldi and S. Vigna. Axioms for centrality. Internet Mathematics, 10(3-4):222–262, 2014.
- [Bozzo and Franceschet2013] E. Bozzo and M. Franceschet. Resistance distance, closeness, and betweenness. Social Networks, 35(3):460–469, 2013.
- [Brandes and Fleischer2005] U. Brandes and D. Fleischer. Centrality measures based on current flow. In STACS, volume 3404, pages 533–544. Springer-Verlag, 2005.
- [Crescenzi et al.2015] P. Crescenzi, G. D’Angelo, L. Severini, and Y. Velaj. Greedily improving our own centrality in a network. In SEA, pages 43–55. Springer, 2015.
- [Crescenzi et al.2016] P. Crescenzi, G. D’Angelo, L. Severini, and Y. Velaj. Greedily improving our own closeness centrality in a network. ACM Transactions on Knowledge Discovery from Data, 11(1):9, 2016.
- [D’Angelo et al.2016] G. D’Angelo, L. Severini, and Y. Velaj. On the maximum betweenness improvement problem. Electronic Notes in Theoretical Computer Science, 322:153–168, 2016.
- [Demaine and Zadimoghaddam2010] E. D. Demaine and M. Zadimoghaddam. Minimizing the diameter of a network using shortcut edges. In SWAT, pages 420–431. Springer-Verlag, 2010.
- [Doyle and Snell1984] P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. Mathematical Association of America, 1984.
- [Hoffmann et al.2018] C. Hoffmann, H. Molter, and M. Sorge. The parameterized complexity of centrality improvement in networks. In SOFSEM, pages 111–124. Springer, 2018.
- [Hutchinson1989] M. F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
- [Ishakian et al.2012] V. Ishakian, D. Erdös, E. Terzi, and A. Bestavros. A framework for the evaluation and management of network centrality. In SDM, pages 427–438. SIAM, 2012.
- [Izmailian et al.2013] N. Sh Izmailian, R. Kenna, and FY Wu. The two-point resistance of a resistor network: a new formulation and application to the cobweb network. Journal of Physics A: Mathematical and Theoretical, 47(3):035003, 2013.
- [Klein and Randić1993] D. J Klein and M. Randić. Resistance distance. Journal of Mathematical Chemistry, 12(1):81–95, 1993.
- [Kunegis2013] J. Kunegis. Konect: the koblenz network collection. In WWW, pages 1343–1350. ACM, 2013.
- [Kyng and Sachdeva2016] R. Kyng and S. Sachdeva. Approximate Gaussian elimination for Laplacians-fast, sparse, and simple. In FOCS, pages 573–582. IEEE, 2016.
- [Lü et al.2016] L Lü, D Chen, X Ren, Q Zhang, Y Zhang, and T Zhou. Vital nodes identification in complex networks. Physics Reports, 650:1–63, 2016.
- [Meyer1973] C. D. Meyer, Jr. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973.
- [Meyerson and Tagiku2009] A. Meyerson and B. Tagiku. Minimizing average shortest path distances via shortcut edge addition. APPROX, pages 272–285. Springer-Verlag 2009.
- [Nemhauser et al.1978] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265–294, 1978.
- [Newman2005] M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 27(1):39–54, 2005.
- [Newman2010] M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010.
- [Olsen2010] M. Olsen. Maximizing PageRank with new backlinks. In ICAC, pages 37–48. Springer-Verlag, 2010.
- [Parotsidis et al.2016] N. Parotsidis, E. Pitoura, and P. Tsaparas. Centrality-aware link recommendations. In WSDM, pages 503–512. ACM, 2016.
- [Peng and Spielman2014] R. Peng and D. A Spielman. An efficient parallel solver for SDD linear systems. In STOC, pages 333–342. ACM, 2014.
- [Plemmons1977] R. J. Plemmons. M-matrix characterizations. I nonsingular M-matrices. Linear Algebra and its Applications, 18(2):175–188, 1977.
- [Spielman and Srivastava2011] D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM Journal of Computing, 40(6):1913–1926, 2011.
- [Stephenson and Zelen1989] K. Stephenson and M. Zelen. Rethinking centrality: Methods and examples. Social Networks, 11(1):1–37, 1989.
- [White and Smyth2003] S. White and P. Smyth. Algorithms for estimating relative importance in networks. In KDD, pages 266–275. ACM, 2003.