Centrality metrics (Lü et al., 2016) provide a ubiquitous Network Science tool for the identification of the “important” nodes in a graph.
They have been widely applied in a range of domains such as early detection of epidemic outbreaks (Chung et al., 2009), viral marketing (Leskovec et al., 2007), trust assessment in virtual communities (Agreste et al., 2015b), preventing catastrophic outage in power grids (Alber et al., 2004) and analyzing heterogeneous networks (Agreste et al., 2015a).
The notion of importance of a node can be defined in a number of ways (Newman, 2010; Boldi and Vigna, 2014; Boldi et al., 2017; Vigna, 2016; Boldi, 2015). Some centrality metrics define the importance of a node in a graph as function of the distance of to other nodes in : for instance, in Degree Centrality, the importance of is defined as the number of the nodes which are adjacent to , i.e. which are at distance one from . Analogously, Closeness Centrality (Newman, 2010)classifies as important those nodes which are few hops away from any other node in .
Another class of centrality metrics looks at walk/path structures in : a walk is a sequence of adjacent nodes;its length is defined as the number of edges it contains; a path is a walk without repeated edges and the shortest path connecting two nodes is also called its geodesic path. For instance, the Betweenness Centrality (Newman, 2010) of is the ratio of the number of geodesic paths from any node to any node which pass through the node to the number of geodesic paths running from to and, thus, nodes with largest betweenness centrality scores are those which intercept most of the geodesic paths in .
A further popular metric is Katz Centrality Score (Katz, 1953), which is understood as the weighted number of walks terminating in : here, the weighting factor is inversely related to walk length and, thus, long (resp., short) walks have a small (resp., large) weight.
For a suitable choice of the weighting factor, the Katz centrality score converges to the Eigenvector Centrality (Benzi and Klymko, 2015; Boldi and Vigna, 2014) or the popular PageRank (Brin and Page, 1998; Boldi and Vigna, 2014).
To the best of our knowledge, however, there is no previous work in which the centrality of a node is closely related to the notion of navigability: roughly speaking, we say that is navigable if it is possible to successfully route a message to any node in via a short chain of intermediary nodes, regardless of the node which generates the message.
Navigability is one of the most important features for a broad range of natural and artificial systems which have the transportation of information (e.g. a computer network) or the trade of goods (e.g. a road network) as their primary purpose. In general, if the topology of the graph underlying the above mentioned systems would be perfectly specified, then any source node could discover all shortest paths starting from (or terminating in) and it could make use of the discovered paths to efficiently route messages.
In practice, nodes in are often able to efficiently route messages even if they do not have a global view of the topology of , and this has encouraged many researchers to seek a better understanding of why graphs arising in real applications are navigable. Early studies on graph navigability were inspired by the seminal work of Travers and Milgram (Travers and Milgram, 1967) on the “small world” property.
In a celebrated experiment, random-chosen Nebraska residents were asked to send a booklet to a complete stranger in Boston. Selected individuals were required to forward the booklet to any of their acquaintances whom they deemed likely to know the recipient or at least might know people who did. In some cases, the booklet actually reached the target recipient by means, on average, of 5.2 intermediate contacts, thus suggesting an intriguing feature of human societies: in large, even planetary-scale, social networks, pairs of individuals are connected through shorts chains of intermediaries and ordinary people are able to uncover these chains (Dodds et al., 2003; Kleinberg, 2000; Goel et al., 2009; Leskovec and Horvitz, 2008).
Several empirical studies have verified the small-world phenomenon in diverse domains such as metabolic and biological networks (Jeong et al., 2000), the Web graph (Broder et al., 2000), collaboration networks among scientists (Newman, 2001) as well as social networks (Dodds et al., 2003; Watts and Strogatz, 1998).
So far, centrality metrics and navigability have been investigated in parallel, yet their research tracks are disconnected. Thus, an important (and still unanswered) direction of inquiry is the introduction of centrality metrics that are related to the navigability of a node, i.e., the ease at which it is possible to reach a target node regardless of the node chosen as source node.
In this article we tackle the questions above by extending previous work by Fenner et al. (Fenner et al., 2008) to the realm of social networks. The main output of our research is an index, called the potential gain, which ranks nodes in a network on the basis of their ability to find a target.
The potential gain of a node depends on the number of walks of length that connect with any other node The underlying idea is that, for a fixed the larger the higher the chance that will reach regardless of the specific navigation strategy. In the computation of the potential gain, we take the small-world phenomenon as axiomatic: we consider an agent that starts from and it looks for short walks to reach .
We observe that the value a walk has for the agent will decreases with its length and there is a threshold length beyond which the agent has to abandon that walk. To formalize the intuition above, we introduce a weighting factor which monotonically decreases with to penalize long walks.
We have developed two variants of the potential gain of (Fenner et al., 2008), namely:
the geometric potential gain, in which decays as , where is a parameter ranging between and the inverse of the spectral radius of 111The spectral radius of
is defined as the largest eigenvalue of the adjacency matrix of., and
the exponential potential gain, in which decays in exponential fashion.
Both the geometric and exponential gain of can be thought as the product of one index (Degree Centrality) related to the popularity of and another (Katz Centrality score, for the geometric potential gain, and Communicability Index (Benzi and Klymko, 2013; Estrada and Rodriguez-Velazquez, 2005) for the exponential potential gain) which reflects the degree of similarity of with all other nodes in the network. In this sense, the geometric and the exponential potential gain are composite centrality metrics, i.e., they constitute a novel class of centrality metrics which combine popularity and similarity to rank nodes in graphs. The combination of popularity and similarity has proven to closely resemble the way humans navigate large social networks (Simsek and Jensen, 2008) or attempt to locate information in large information networks such as Wikipedia (West et al., 2009; West and Leskovec, 2012; Helic et al., 2013).
Our formalisation applies the Neuman series expansion (Horn and Johnson, 2013) to efficiently but accurately approximate both the geometric and exponential gain. Both theoretical and experimental analysis show that our approach is appropriate for accurately computing the geometric and exponential potential gain in large real-life graphs consisting of millions of nodes and edges, even with modest hardware resources.
We validated our approach on three large datasets: Facebook (a graph of friendships among Facebook users), DBLP (a graph describing scientific collaboration among researchers in Computer Science) and YouTube (a graph mapping friendship relationships among YouTube users). The experimental results will be in the full version of this article.
In this section we introduce some basic terminology for graphs that will be largely used throughout this article.
Let a graph
be an ordered pairwhere N is a set of nodes, here also called vertices, and is the set of edges. As usual, is undirected if edges are unordered pairs of nodes and directed otherwise. In this article we will consider only undirected graphs.
Also, let be the number of nodes, the number of edges of . For any given node i its neighborhood is the set of nodes directly connected to it; its degree is the number of edges incident onto it, i.e., .
A walk of length (with a non-negative integer) is a sequence of nodes such that consecutive nodes are directly connected: for Also, we use the term path for walks that do not have repeated vertices. A walk will be closed if it starts and ends at the same node.
We will represent graphs by their associated adjacency matrix, defined as usual with if and 0 otherwise. Sometimes we may slightly simplify notation with
The adjacency matrix provides a compact formalism to describe many graph properties: for instance, the matrix where , gives the number of walks of length two going from to . Inductively, for any positive integer , the matrix will give the number of closed (resp., distinct) walks of length between any two nodes and if (resp., if ) (Cvetkovic et al., 1997).
It is a well-know fact that the adjacency matrix of any undirected graph is symmetric and, hence, all its eigenvalues are real. The largest eigenvalue of is also called its principal eigenvalue or spectral radius of . Moreover, the corresponding eigenvectors will form an orthonormal basis in (Strang, 1993). Eigenpairs are formed by the eigenvalue and the corresponding eigenvector .
3. A model of network navigability
In this section we introduce our new centrality metrics, called the geometric and exponential potential gain.
As we will see, they share a common physical interpretation which is based on the notion of graph navigability: roughly speaking, we say that a graph is navigable if, for any target node in , it is possible to reach via short paths/walks, independently of the node (called the source) from which we choose to start exploring from.
In the light of previous research on graph navigability, we informally define the navigability score of a node as a measure of the “easiness” with which it is possible to reach independently of the source node . In this way, the navigability score of a node can be interpreted as a centrality metric.
To define the navigability score we borrow some ideas from previous work by Fenner et al. (Fenner et al., 2008), who formulated the problem of identifying a “good” page from which a user should start exploring the Web.
A page is classified as a good starting point if it satisfies the following criteria: (1) it is relevant, i.e. the content of closely matches user’s information goals, (2) the page is central, i.e., the distance of to other Web pages in the Web graph is as low as possible and (3) the page is connected, in the sense that is able to reach a maximum number of other pages via its outlinks.
A key difference between the approach of Fenner et al. (Fenner et al., 2008) and the current one is that they defined the navigability score for as the ability of of acting as the source node for reaching all the other nodes. In our setting, instead, we think of the node as the target node we wish to reach.
So, let us fix a source node and a target node
and provide an estimateof how “easy” it will be for to be reached if we choose as source node. Intuitively, the larger the number of walks from to , the easier it is for to be reached from ; in addition, we assume that the task of exploring a graph is costly and such cost increases as the length of the walks/paths we use for exploration purposes increases. Therefore, shorter walks should be preferred to longer ones.
By combining the requirements above, we obtain:
here is the number of walks of length going from to and the non-increasing function acts as penalty for longer walks. If we sum over all possible source nodes , we obtain a global centrality index for :
In analogy to Fenner et al. (Fenner et al., 2008; Levene and Wheeldon, 2004), we will call the potential gain of .
Depending on the choice of the penalty function we obtain two variants of the potential gain, namely the geometric and the exponential potential gain (see Section 3.1).
3.1. The geometric and exponential potential gain
Given the above specifications, we first define the potential gain in matrix notation. For the base case, consider walks of length =1, i.e., direct connections. Only the neighbours of a node will contribute to the potential gain of which leads to the trivial conclusion that, at , nodes with the largest degree are also those ones with the largest potential gain.
We define the vectorsuch that for every node :
If we include walks of length two, then we have to consider the squared adjacency matrix . So, we add a contribution to the potential gain.
By induction, nodes capable of reaching from through walks of length up to provide a contribution to the potential gain equal to . By summing over all possible values of we get to the following expression for :
To attenuate the effect of the walks’ length, we will consider two weighting functions, namely:
Geometric: with . So we define the geometric potential gain, :
Exponential: . So we define the exponential potential gain, :
4. Potential Gain as centrality
The geometric and the exponential potential gain introduced above yield a ranking of network nodes and, therefore, it is instructive to compare them with popular centrality metrics. Recall that we defined the spectral radius of as the largest eigenvalue of .
As for the geometric potential gain, if we let , the following expansion holds:
in which we make use of the Neuman series (Horn and Johnson, 2013)
At this point, the term is exactly the Katz centrality score (Katz, 1953; Leicht et al., 2006), a popular centrality metric that defines the importance of a node as a function of its similarity with other nodes in Hence, we can say that the geometric potential gain combines two kind of contributions: popularity, as captured by node degree, and similarity as captured by Katz’s similarity score.
It is also instructive to consider what happens for extreme values of : if , then the geometric potential gain tends to , i.e., it coincides with degree. In contrast, if , then the Katz centrality score converges to eigenvector centrality (Benzi and Klymko, 2015), another popular metric adopted in Network Science. Boldi et al. (Boldi and Vigna, 2014; Boldi et al., 2017; Vigna, 2016; Boldi, 2015) show that the Katz Centrality score is also strictly related to the PageRank. More specifically, the PageRank vector coincides with the Katz Centrality score provided that the adjacency matrix is replaced by its row-normalized version :
Here, the parameter is the so-called PageRank damping factor. Let us now concentrate on the exponential potential gain. We rewrite Equation 5 as follows:
where is the exponential of (Higham, 2008).
Specifically, measures how easy is to send a unit of flow from a node to a node and vice versa. Such a parameter is known as communicability and it can be regarded as a measure of similarity between a pair of nodes. Communicability has been successfully used to discover communities in networks (Estrada et al., 2012). The product yields a centrality metric which defines the importance of a node as function of its ability to communicate with all other nodes in the network. In turn, the diagonal entry of the matrix exponential defines a further centrality metric called subgraph centrality (Estrada and Rodriguez-Velazquez, 2005). As a result of the rewriting above, we clearly see exponential potential gain as dependent on two factors: popularity of (i.e., its degree) and similarity of with all other nodes in the network.
The computation of the geometric (resp., exponential) potential gain for all nodes in needs the specification of the full adjacency matrix ; in this sense, the geometric and the exponential potential gain should be considered as global centrality metrics, on par with the Katz centrality score and Subgraph centrality.
We have introduced the potential gain, an index to rank nodes in graphs that captures the ability of a node to act as a target point for navigation within the network. We have defined two variants of the potential gain, the geometric and exponential potential gain. We then proposed two iterative algorithms that compute the geometric and exponential potential gain and proved their convergence. We evaluated the scalability of our algorithms on three real large datasets.
We have discovered connections between the geometric potential gain and other, well-known, centrality metrics; GPG provides a new, mixed global-local centrality measure. Indeed, the PG as a centrality index has several merits:
it unifies Katz and Communicability into a single framework;
in its definition in terms of the PG it allows us to provide novel and efficient approximations of these indices;
it provides an instance of a novel class of composite indices, in this case DC*Katz, and
as each vertex has clear visibility of its neighbours, the realisation that PG is a combination of local (Degree) and global (Katz) centrality makes complete sense, in our opinion.
It is also possible that these results will open the door to a new interpretation of social phenomena related to Travers-Milgram’s “small world” experiment (Travers and Milgram, 1967).
One question that could be discussed at this point is which of the two new measures could be considered the best analysis tool large networks. Early experimental results indicate different rates of convergence but no clear “winner.”
From a computational standpoint, the geometric potential gain is clearly superior. So, for the analysis of very large networks and/or modest hardware resources it is the navigability score of choice. One practical difference however remains. The exponential potential gain is parameter-free and can be applied directly. On the other hand, the geometric potential gain is parametric in thus it requires a careful tuning of the algorithm.
Another topic for future work is investigation on the relationship between network robustness and network navigability. To this end, we intend to design an experiment in which graph nodes are ranked on the basis of their geometric/exponential potential gain and then are progressively removed from the graph. Basic properties about graph topology, such as the number and size of connected components shall be re-evaluated upon node deletion. We also plan to study how adding edges can increase the geometric/exponential potential gain of a target group of nodes.
- Analysis of a heterogeneous social network of humans and cultural objects. IEEE Transactions on Systems, Man, and Cybernetics: Systems 45 (4), pp. 559–570. Cited by: §1.
- Trust Networks: Topology, Dynamics, and Measurements. IEEE Internet Computing 19 (6), pp. 26–35. External Links: Cited by: §1.
- Structural vulnerability of the North American power grid. Physical review E 69 (2), pp. 025103. Cited by: §1.
- Total communicability as a centrality measure. Journal of Complex Networks 1 (2), pp. 124–149. Cited by: §1.
- On the limiting behavior of parameter-dependent network centrality measures. SIAM Journal on Matrix Analysis and Applications 36 (2), pp. 686–706. Cited by: §1, §4, §4.
- Rank monotonicity in centrality measures. Network Science 5 (4), pp. 529–550. External Links: Cited by: §1, §4.
- Axioms for centrality. Internet Mathematics 10 (3-4), pp. 222–262. External Links: Cited by: §1, §1, §4.
- Large-scale network analytics: diffusion-based computation of distances and geometric centralities. See Proceedings of the 24th international conference on world wide web companion, WWW 2015, florence, italy, may 18-22, 2015 - companion volume, Gangemi et al., pp. 1313. External Links: Cited by: §1, §4.
- The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems 30 (1-7), pp. 107–117. Cited by: §1.
- Graph structure in the Web. Computer Networks 33 (1-6), pp. 309–320. Cited by: §1.
- Distributing antidote using Pagerank vectors. Internet Mathematics 6 (2), pp. 237–254. Cited by: §1.
- Eigenspaces of graphs. Cambridge University Press. Cited by: §2.
- An experimental study of search in global social networks. Science 301 (5634), pp. 827–829. Cited by: §1, §1.
- The physics of communicability in complex networks. Physics reports 514 (3), pp. 89–119. Cited by: §4, §4.
- Subgraph centrality in complex networks. Physical Review E 71 (5), pp. 056103. Cited by: §1, §4.
- Modelling the navigation potential of a Web page. Theoretical Computer Science 396 (1-3), pp. 88–96. Cited by: §1, §1, §3, §3.
- Proceedings of the 24th international conference on world wide web companion, WWW 2015, florence, italy, may 18-22, 2015 - companion volume. ACM. External Links: Cited by: P. Boldi (2015).
- Social search in “small-world”experiments. In Proc. of the International Conference on World Wide Web ( WWW 2009), Madrid, Spain, pp. 701–710. Cited by: §1.
- Models of human navigation in information networks based on decentralized search. In Proc. of the ACM conference on Hypertext and Social Media, Paris, France, pp. 89–98. Cited by: §1.
- Functions of matrices: theory and computation. Vol. 104, SIAM. Cited by: §4.
- Matrix analysis. 2 edition, Cambridge Univ. Press. Cited by: §1, §4.
- The large-scale organization of metabolic networks. Nature 407 (6804), pp. 651. Cited by: §1.
- A new status index derived from sociometric analysis. Psychometrika 18 (1), pp. 39–43. Cited by: §1, §4.
- The small-world phenomenon: an algorithmic perspective. In Proc. of the ACM symposium on Theory of computing (STOC 2000), pp. 163–170. Cited by: §1.
- Vertex similarity in networks. Physical Review E 73 (2), pp. 026120. Cited by: §4.
- The dynamics of viral marketing. ACM Transactions on the Web 1 (1), pp. 5. Cited by: §1.
- Planetary-scale views on a large instant-messaging network. In Proc. of the International Conference on World Wide Web, WWW 2008, Beijing, China, pp. 915–924. Cited by: §1.
- Navigating the World Wide Web. In Web Dynamics, M. Levene and A. Poulovassilis (Eds.), pp. 117–151. Cited by: §3.
- Vital nodes identification in complex networks. Physics Reports 650, pp. 1–63. Cited by: §1.
- The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98 (2), pp. 404–409. Cited by: §1.
- Networks: an introduction. Oxford University Press. Cited by: §1.
- Navigating networks by using homophily and degree. Proceedings of the National Academy of Sciences 105 (35), pp. 12758–12762. Cited by: §1.
- Introduction to linear algebra. Vol. 3, Wellesley-Cambridge Press Wellesley, MA. Cited by: §2.
- The small world problem. Phychology Today 1 (1), pp. 61–67. Cited by: §1, §5.
- Spectral ranking. Network Science 4 (4), pp. 433–445. External Links: Cited by: §1, §4.
- Collective dynamics of small-world networks. Nature 393 (6684), pp. 440. Cited by: §1.
- Automatic versus human navigation in information networks.. In Proc. of the International Conference on Weblogs and Social Media, (ISWMC 2012), Dublin, Ireland. Cited by: §1.
Wikispeedia: An Online Game for Inferring Semantic Distances between Concepts.
Proc. of the International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, USA, pp. 1598–1603. Cited by: §1.