I Computing profile closeness
Consider a large network with nodes and links. Since is very large, we modify the definition of the profile. It is no more defined as a multiset. To improve convenience, we define profile as a weighted subset of nodes.
where is an arbitrary vertex of and is the rank of in based on its priority.
may contain disconnected components. When two nodes are unconnected, the distance between them becomes infinity. We avoid these pairs in our computation. Given a node , the total distance of with respect to is
Note that we consider a distance only when it is not
Now, we define the profile closeness as normalized inverse of .
As in the case of normal closeness centrality, nodes with higher values are the ones with better access to profile nodes. Median of the network, , is the set of nodes with maximum profile closeness.
I-a Choosing rank function
Degree () of a node refers to the number of edges incident on it. A high-degree node has a direct influence on a larger part of the network (See Opsahl et al. ). Therefore, it can act as an important decision-maker in the consensus problem. Such nodes should be given higher priority. We can do this by assigning .
However, the choice of the rank function depends on the problem we are dealing with. An excellent candidate for rank function in spreading dynamics, like information (rumour) dissemination or epidemic outbreak, is the node influence. An example of this can be the epidemic impact discussed in .
I-B Choosing a profile
The relevance of a profile depends on the fraction of high-rank nodes included in it. Suppose, consists of prominent nodes (say, hubs) from different disconnected components in . Then effectively captures the relative closeness of a node to the key nodes in . A high indicates that can act as a critical access point to the vital areas of the network. There are many different ways to identify a set of vital nodes in a network. Refer  for the state-of-art review of vital node identification.
Detecting a set of vital nodes can help in adopting budget-constrained methods to enhance the security of a network. But, this does not hold true when the identified set itself is very large. In such a case, we need to find the minimum number of nodes which have easy access to this set. Profile closeness does this job. We can make the set of vital nodes as the profile , rank these nodes based on their vitality, compute and identify nodes with higher values. Let be the maximum number of nodes which can be secured within the given budget. Then nodes with highest possible values are the efficient candidates to be protected.
Ii Closeness and profile closeness
As we discussed in the introduction, profile closeness of a node measures its closeness centrality when the profile is the entire node set and rank of nodes is unity. i.e.
In 1979, Freeman  introduced the concept of centralization of a graph (or network) to compare the relative importance of its nodes. Centralization is also a way to compare different graphs based on respective centrality scores.
In order to find centralization scores, we need to find the maximum possible value of centrality () and the deviation of the centrality of different nodes () from . Then centralization index is the ratio of this deviation to the maximum possible value for a graph containing the same number of nodes.
Freeman  showed that the closeness centrality attains maximum score if and only if the graph is a star. This was proven later by Everett et al. . Also, the minimum value is attained when the graph is a complete graph or a cycle.
The profile closeness attains maximum value when is the entire vertex set of the graph. In this case, for any node . Therefore, the centralization of profile closeness coincides with closeness centrality.
However, we need to compare the performance of and in the intended applications of . Since is a global measure whereas is highly localised to the profile ; we need to do comparisons locally also. So, we need to do two comparisons; one with the global closeness centrality , and the other with a local closeness measure known as cluster closeness, . Note that the only difference here is that lacks the priority ranking of group members, which is an essential feature of .
We generate some random scale-free networks and identify its clusters. Then we calculate the global closeness for each node. We calculate of a node as its closeness to its parent cluster. Also, we construct a profile with these clusters. Here, the rank of a node , , is (the number of neighbors of within the cluster). Thus, if a node has a large number of connections within its cluster, then it is considered as having higher priority in the profile. We compute with these profiles and compare them with and over all the generated networks. For comparing these measures, we use the correlation between them.
We did simulations on random scale-free networks with and nodes and average degrees and . The results of correlation are shown in tables I and II. The values in each cell are the average correlation between the measures. The range of correlation (max-min) is shown below each value in brackets.
Table I shows the correlation between closeness centrality and profile closeness for the generated random networks. Both are positively correlated, and the relationship is fairly good enough. An important pick here is that closeness centrality in large networks is highly correlated with its profile closeness. This seems interesting because the computation of profile closeness is less data-consuming when compared to the computation of closeness centrality. Assume that both measures give the same ranking of nodes in a large network . Then, we can use the low-computational profile closeness for closeness ranking of nodes in . However, this part needs more research. We need to simulate the experiment on very large networks in order to ensure this capability of profile closeness.
Table II shows the correlation between cluster closeness and profile closeness for the generated random networks. We observed that the average correlations are high, which indicates a strong relationship between and . Another interesting observation is that the average correlation increases steadily with network size, for sparse as well as dense networks.
Iii Application: Community closeness
When the profile under consideration is a community, we call it a community profile. The relative importance of community members differ with their influence on other community members and the network as a whole. Some of the related works in this regard are discussed below.
Guimerá and Amaral (2005)  studied the pattern of intra-community connections in metabolic networks. They analysed the degree of nodes within the community (within-module degree) to understand if it is centralised or decentralised. A community is centralised if its members have a different within-module degree.
Wang et al. (2011)  proposed two kinds of important nodes in communities: community cores and bridges. Community cores are the most central nodes within the community whereas bridges act as connectors between communities. Han et al. (2004)  has also given a similar characterisation of nodes important in a community as party hubs and date hubs where party hubs are like community cores and date hubs like bridges.
Gupta et al. (2016)  proposed a community-based centrality known as Comm Centrality to find influential nodes in a network. Computation of this centrality does not need the entire global information about the network, but only the intra and inter-community links of a node.
The above works give evidence that the communities; especially the relative importance of their members; influence the overall behaviour of the network in a great deal. A community profile captures the relative importance of community members. Here, all the nodes are not considered homogenous. We prioritize nodes like community cores and bridges in a community profile.
The application of community profile is two-fold.
Prioritize the community cores and bridges in all the communities in a profile. Then, the profile closeness determines the accessibility of these vital nodes from every nook and corner of the network.
Construct community profile from a single community; with priority given to vital members. Then, the profile closeness predicts the new nodes who may join the community and members who may be on the verge of leaving the community.
The first application gives a way to measure the global accessibility of the network. (We are not going to explore this direction more.) The second one is more about local accessibility to a community. We describe it in more detail in the next section.
Iii-a Constructing community profile
The first step in constructing a community profile is to identify communities in the network. Once we have detected the communities, we need to rank members in each community. The ranking is based on intra-modular degree (). We can also use other relevant community-based measures like Comm centrality ( ) for ranking purpose. denotes the rank of a node . Now, we define community profile as
The construction of a community profile is devised in algorithm 1, Gen_.
Iii-B Computing community closeness
Algorithm 2 computes community closeness of the entire network
Iii-C Predicting community members
Given a node and profile in , algorithm 2 correctly computes the node’s closeness to the community corresponding to . A community is stable when every node in a community has comparable closeness values. In other words, the community is unstable when the intra-community closeness of its nodes show drastic variations. Nodes with higher values are likely to continue in the community, whereas those with very low values may leave the community in future. We did experiments on networks with first-hand information on its ground-truth communities. Empirical evidence shows that the above observation is true. Another interesting observation was that the nodes which exhibit large closeness towards an external community tend to join that community in future. Thus profile closeness is an adequate indicator of how communities evolve in a network. The efficiency of this prediction depends on the design of the community profile.
Iii-D Empirical evidence - On networks with ground-truth communities
Research on community detection has been very active for the past two decades. Many community detection techniques were devised. The Girvan-Newman method of community detection , based on edge betweenness, was one novel approach. Later, the same team came up with the modularity concept, a qualitative attribute of a community. See . Modularity is defined as the difference between the fraction of edges in a community and the expected fraction in a random network. Girvan and Newman observed that this attribute for a robust community falls between and . Therefore, modularity optimization can lead to better community detection. However, this is an NP-complete problem . Different approximation techniques based on modularity optimization produce community structures of high quality, that too with very low time requirements (of the order of network size). A very recent survey by Zhao et al.  gives a clear picture of the state-of-art.
In our study, we used the Louvain method  of modularity optimization for detecting communities. It is an agglomerative technique which starts with each node assigned as a unique community. The algorithm works in multiple passes till best partitions are achieved. Each pass consists of two phases; in phase nodes are moved to its neighbour’s community if it can achieve a higher gain in modularity and in phase new network is created from the communities detected in pass .
First, we simulated our results on two real-world networks in which community structure is evident. The networks are Zachary’s karate club network  and American college football network . See table III.
Iii-D1 Zachary’s karate club network
We did our primary survey on the famous karate club network data, collected and studied by Zachary  in 1977. In his study, Zachary closely observed the internal conflicts in a 34-member group (a university-based karate club) over a period of years. The conflicts led to a fission of the club into two groups. See table IV. He modeled the fission process as a network. The nodes of the network represented the club members and edges represented their interactions outside the club. Zachary predicted this fission with greater than accuracy and argues that his observations are applicable to any bounded social groups. Many researchers used this network as a primary testbed for their studies on community formation in complex networks.
We identified communites in the network (using the Louvain method). See table V.
We used the intra-module degree () of nodes for constructing the profile. The nodes in the profile were prioritised based on their value. Nodes having higher value were given higher priority. Then the profile closeness was computed for each community member. See figure 1. Different colors represent members of different communities. The relative size of the nodes represent their profile closeness with respect to their own community.
The profile closeness of node in its community () is very low. From this, we can interpret that has a higher tendency to leave its community. Also, we compared the profile closeness of all nodes with respect to community (). See figure 2. Nodes external to Community I are colored blue. Among them, Node has a higher value for . This high value of and the low value of indicates that has more affinity towards Community I than its own community, Community III.
This observation is relevant since node originally belongs to Community as noted by Zachary. Furthermore, Zachary had even observed that member is a weak supporter of the second faction (); but joined the first faction () after the fission. Our method also reproduced the same fact.
Iii-D2 American college football network
The second network chosen for our study was the American college football network, from the dataset collected by Newman . The nodes in this network represent the college football teams in the U.S. and the edges represent the games between them in the year 2000. About - teams were grouped into a conference. Altogether conferences were identified. Most of the matches were between the teams belonging to the same conference. Therefore the inherent community structure in this network corresponds to these conferences. These ground-truth communities are given in table VI.
|Atlantic||Flora. St.||N. Caro. St.||Virginia|
|Coast||Georg. Tech||Duke||N. Caro.|
|Independents||Notre Dame||Utah St.|
|Mid||Akron||Bowl. Green St.||Buffalo|
|Ohio||N. Illin.||W. Michigan|
|Ball St.||C. Michigan||Toledo|
|Big||Virg. Tech||Boston Coll.||W. Virg.|
|Conference||Alabama Birm.||E. Caro.||S. Missis.|
|Missis. St.||Louis. St.||Missis.|
|W.||Louis. Tech||Fresno St.||Rice|
|Athletic||S. Method.||Nevada||San Jose St.|
|T. El Paso||Tulsa||Hawaii|
|Sun||Louis. Monroe||Louis. Lafay.||Mid. Tenn. St.|
|Belt||N. Texas||Arkansas St.||Idaho|
|New Mex. St.|
|Pac||Oreg. St.||S. Calif.||UCLA|
|Mountain||Brigh. Y.||New Mex.||San Diego St.|
|Nev. Las Vegas||Air Force|
|Texas A & M||Oklahoma||Kansas St.|
In the community detection step, we identified 10 communities (See table VII). Four among them (, , and ) correspond to the ground-truth communities (AtlanticCoast, Pac 10, Big 10 and Big 12 respectively.) Community is a combination of two actual communities, Mountain West and Sun Belt.
|I||Flora St.||N. Caro. St.||Virginia|
|Georg. Tech||Duke||N. Caro.|
|Bowl. Green St.||Buffalo||Kent|
|N. Illin.||W. Mich.||Ball St.|
|C. Mich.||E. Mich.|
|III||Virg. Tech||Boston Coll.||W. Virg.|
|IV||Alabama Birm.||E. Caro.||S. Missis.|
|Missis. St.||Louis. St.||Missis.|
|Louis. Monroe||Mid. Tennes. St.||Louis.Lafay.|
|Louis. Tech||C. Flora|
|San Jose St.||T. El Paso||Tulsa|
|Hawaii||Fresno St.||T. Christ.|
|VII||Oregon St.||S. Calif.||UCLA|
|VIII||Brigham Y.||New Mex.||San Diego St.|
|N Las Vegas||Air Force||Boise St.|
|N. Texas||Arkansas St.||New Mex. St.|
|Texas A & M||Oklahoma||Kansas St.|
We then examined the profile closeness of all nodes to community . See figure 4. We observed that Central Florida has a higher closeness to . This conforms to the ground truth that Central Florida team played with teams like Connecticut in many matches.
Iii-D3 Dolphins network
Another chosen network with the ground-truth community is the dolphins network, which is from the dataset collected by Lusseau et al., in University of Otago- Marine Mammal Research Group  (2003). Lusseau along with Newman  (2004) used this data to study the social network of bottlenose dolphins. In this work, they observed fission in the network to two groups with one individual (SN100) temporarily leaving the place. These communities are shown in table VIII.
We checked the closeness to community . See figure 6. It is clearly visible that DN63 and Knit are having higher chances of grouping with community . This conforms to the observation made by Lusseau and Newman.
We proposed the profile closeness centrality which is adequate for solving consensus problems in complex networks. A profile is a set of nodes with assigned priorities (rank). Some of the salient features of profile closeness include:
Rank assigned to a profile node depends on the extent of influence it has on the network. For example, high degree nodes, which directly influence a large part of the network, are ranked high.
The choice of the rank function depends on the domain of the problem.
Suitable for budget-constrained network problems.
Closely correlates with the global closeness centrality for large networks. Thus, it may help in reducing computation time while determining closeness ranking in a network.
Aid in predicting community evolution.
The main takeaway of this work is that the relative importance of the community members plays a key role in attracting new nodes or repelling existing nodes. However, more investigations are needed to find alternative techniques to assign member priorities. Promising future work is the involvement of profile closeness in the temporal evolution of communities.
This work was supported by the National Post Doctoral Fellowship (N-PDF) No. PDF/2016/002872 from Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India.
The authors are grateful to Prof Animesh Mukherjee (IIT Kharaghpur) for providing valuable comments on the work.
-  H. J. Bandelt, J. P. Barthélemy, “Medians in median graphs,” Disc. Appl. Math. 8, 1984, pp. 131 – 142.
-  K. Balakrishnan, M., Changat, H. M. Mulder, “Median computation in graphs using consensus strategies,” (No. EI 2007-34). Report / Econometric Institute, Erasmus University Rotterdam, 2007.
-  K. Balakrishnan, M., Changat, H. M. Mulder, “Plurality strategy in graphs,” Australesian. J. Combin. 46, 2010, pp. 191–202.
-  U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, D. Wagner, “On Modularity Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 2, pp. 172-188, Feb. 2008.
-  V. D. Blondel, J-L. Guillaume, R. Lambiotte, E. Lefebvre, “Fast unfolding of communities in large networks,” J. Statistical Mechanics: Theory and Experiment (10), P10008, 12 pages, 2008.
-  M. Changat, D. S. Lekha, A. R. Subhamathi, “Algorithms for the remoteness function, and the median and antimedian sets in -graphs,” International Journal of Computing Science and Mathematics; 6(5), 2015, pp. 480–491.
-  M. G. Everett, P. Sinclair, P. Dankelmann, “Some centrality results new and old,” J. Math. Sociol., Volume 28, 2004, pp. 215–227.
-  L. Freeman, “Centrality in Social Networks,” Social Networks, 1, 1978, pp. 215–239.
-  M. Girvan, M. E. J. Newman, “Community structure in social and biological networks,” Proc. Natl. Acad. Sci. USA 99, 2002, pp. 7821–7826.
-  M. Girvan, M. E. J. Newman, “Finding and evaluating community structure in networks,” Phys. Rev. E 69, 026113, 16 pages 2004.
-  M. Granovetter, “Threshold Models of Collective Behavior,” The American J. Sociology, Vol. 83, 1978, pp. 1420–1443.
-  N. Gupta, A. Singh, H. Cherifi, “Centrality measures for networks with community structure,” Physica A, 452, 2016, pp. 46–59.
-  R. Guimerá, L. A. N. Amaral, “Functional cartography of complex metabolic networks,” Nature, Volume 433, 2005, pp. 895–900.
-  J. D. J. Han, N. Bertin, T. Hao, D. S. Goldberg, G. F. Berriz, L. V. Zhang, D. Dupuy, A. J. M. Walhout, M. E. Cusick, F. P. Roth, M. Vidal, “Evidence for dynamically organized modularity in the yeast protein–protein interaction network,” Nature, 430, 2004, pp. 88–93.
-  L. Lu, D. Chen, X. Ren, Q. Zhang, Y. Zhang, T. Zhou, “Vital nodes identification in complex networks,” Physics Reports, Volume 650, 2016, pp. 1–63.
-  D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, S. M. Dawson, “The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations - Can geographic isolation explain this unique trait?,” Behavioral Ecology and Sociobiology 54, 2003, pp. 396–405.
-  D. Lusseau, M. E. J. Newman, “Identifying the role that animals play in their social networks,” Proceedings. Biological sciences vol. 271 Suppl 6, 2004, pp. S477–S481.
-  H. M. Mulder, “Majority strategy on graphs,” Discrete Appl. Math. 80, 1997, pp. 97–105.
-  F. Morone, H. A. Makse, “Influence maximization in complex networks through optimal percolation,” Nature 65, Vol 524, 2015, pp. 65–68.
-  T. Opsahl, F. Agneessens, J. Skvoretz, “Node centrality in weighted networks: Generalizing degree and shortest paths,” Social Networks, Volume 32, Issue 3, 2010, pp. 245–251.
-  M. Šikić, A. Lančić, N. Antulov-Fantulin, H. Štefančić,“Epidemic centrality — is there an underestimated epidemic impact of network peripheral nodes?,” Eur. Phys. J. B 86:440, 2013, 13 pages.
-  M. Takaffoli, R. Rabbany, O. R. Zaïane, “Community evolution prediction in dynamic social networks,” 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), Beijing, 2014, pp. 9–16.
-  Y. Wang, Z. Di, Y. Fan, “Identifying and Characterizing Nodes Important to Community Structure Using the Spectrum of the Graph,” PLoS ONE, 6(11), e27418, 2011.
-  W. W. Zachary, “An Information Flow Model for Conflict and Fission in Small Groups,” J. Anthropological Research, 33, 4, 1977, pp. 452–473.
-  Z. Zhao, S. Zheng, C. Li, J. Sun, L. Chang, F. Chiclana, “A comparative study on community detection methods in complex networks,” J. Intelligent & Fuzzy Systems, vol. 35, no. 1, 2018, pp. 1077–1086.