Communities in networks are commonly considered as subgraphs with dense internal but sparse external connections (cf., e.g., Girvan and Newman (2002), Bagrow and Bollt (2005), and Fortunato (2010)). In other words, a community should be a highly cohesive subgraph which is well separated from the rest of the network. Maximum cohesion of nodes is reached in fully connected subgraphs (cliques), maximum separation for subgraphs without external connections (components). In many practical cases the two essential features of communities cannot be maximised both at the same time, which is why a compromise is sought by some methods for the construction of communities (for reviews of community finding, see Fortunato (2010), Xie:2013:OCD, and Amelio and Pizzuti (2014)). When a compromise is not appropriate for the problem to be solved, it might be advantageous to separate the two criteria. In this paper, we explore such an approach by defining communities as well separated subgraphs which can have one ore more cohesive cores surrounded by less cohesive peripheries. Due to the size bias of cohesion measures we favour separation as the defining feature. Methods for finding core-periphery structures were reviewed by csermely_structure_2013.
We apply this idea to link communities in networks. Link clustering was introduced by Evans and Lambiotte (2009) and by ahn2009link. The aim of our paper is to operationalise the argument for separating the evaluation of cohesion and separation for link communities, and to propose an algorithm that identifies core-periphery structures in well separated link communities. In Section 2 we discuss the conceptual problems of simultaneously maximising cohesion and separation. In Section 3 a method for determining core-periphery structures of link communities is derived. In Section 4 it is applied to results of a local link clustering exercise Havemann et al. (2017) and to the karate-club network Zachary (1977).
2 Cohesion and Separation
If communities in networks are considered as highly cohesive and well separated subgraphs, all cliques without external links are ideal but trivial and very rare communities of nodes. In nearly all cases we have to content ourselves with imperfect communities. In this section, we discuss three problems of community construction, namely (A) the existence of real-world problems for which the maximisation of cohesion is likely to create artifacts, (B) the necessity to compromise between maximising cohesion and separation for all other real-world problems, and (C) the size bias of most measures of cohesion.
(A) Maximisation can produce artifacts: Some real-world problems are represented by communities that contain the boundaries of other communities. This is the case when communities form a hierarchy, i.e. when larger communities contain smaller ones. In this case, the smallest communities of a hierarchy can be rather cohesive but supercommunities only if their subcommunities are not very well separated Ravasz and Barabási (2003); Rezvani et al. (2018). A second case is communities overlapping pervasively (i.e. not only in boundary nodes). Here, too, boundaries of one community run through another, thereby lowering the cohesion and violating the demand that communities should be hard to split Kannan et al. (2004); Leskovec et al. (2010); Yang and Leskovec (2013). Applying a cohesion-maximising algorithm to these problems might lead to important communities being excluded from consideration, or to artefactual communities being included.
(B) Compromising: The best way to compromise between cohesion and separation may be difficult to determine. This problem occurs especially with approaches that evaluate single communities. If whole networks are partitioned into disjoint communities the compromise is often built into the algorithm and cannot be considered separately (e.g. in the case of modularity-maximising algorithms, Newman and Girvan (2004)). If an algorithm evaluates single communities, cohesion and separation are unlikely to be maximal for the same subgraph, which necessitates a compromise. This raises the question how such a compromise should look like. Pizzuti (2009)
introduced a bi-objective optimisiation which allows to choose an appropriate compromise between the two features. She used a genetic algorithm to maximise internal and to minimise external connectivity of a partition’s communities. kannan_clusterings:_2004 proposed to solve this problem by setting a minimum level for one feature and maximising the other under this minimum condition. However, it depends on the real-world problem to be if and how a suitable compromise can be found.
(C) Size bias of cohesion measures: One of the major – and so far underappreciated – problems of community construction is the size bias of most cohesion measures. In general, global cohesion of a set can be measured by the ratio of the number of directly connected element pairs to the theoretical maximum of this number.111For the cohesion of nodes in monopartite topological graphs this ratio equals link density which is maximal for cliques. For link communities, we derived an analogue to link density named connectedness density of links (see App., p. Cohesion of Link Sets). It is maximal for star subgraphs. In imperfect communities, links may be so unevenly distributed that the communities contain well separated and cohesive subgraphs. This is why some authors demand that in addition to be highly cohesive, a community should be “hard to split” and measure cohesion by internal conductance, i.e. the minimal conductance of all possible splits Kannan et al. (2004); Leskovec et al. (2010); Yang and Leskovec (2013). Although internal conductance is expected to have no size bias, its calculation depends on the identification of the best split, which creates significant problems for community construction. Furthermore, this demand cannot be upheld for the problems described under (A) above. For some practical problems it is sufficient that members of communities have a high local cohesion; cf., e.g., xu_scan:_2007. Local cohesion of nodes can be measured, e.g., by their degree or the local clustering coefficient. Most cohesion measures based on network topology solely are scaling with size in sparse networks. Link density tends to be smaller for larger subgraphs (cf. Schaeffer (2007), p. 50).222Similar to the link density of nodes, connectedness density of links scales with size: larger link communities tend to have lower values. When average internal degree is used to evaluate the cohesion of a community the opposite size bias is observed. For scale-free networks the average clustering coefficient decreases with size Ravasz and Barabási (2003).
A further option to measure cohesion seems to be to relate the sum of internal degrees of nodes in to the sum of their total degrees
. This ratio equals the probability that a random walker found in node communitydoes stay within in the next step and is therefore called persistence probability Piccardi (2011); Rossa et al. (2013):
appears to measure cohesion of nodes but is insensitive to the distribution of connections. Two subgraphs with the same number of external and of internal links have the same persistence probability but can have a rather different cohesion of nodes measured by their link density or their internal conductance. The random walker needs many internal and a few external links to walk within for a while but the internal structure is not relevant. For example, can also be a chain of nodes or even be disconnected. This means, that persistence probability measures separation rather than cohesion as defined here. With increasing external degree persistence probablity decreases. This can be made explicit when we rewrite it: , where equals the probability of a random walker found in to leave the community in the next step, also called escape probability and denoted here by Fortunato (2010), cf. App., p. The Random Walker and Separation of Communities.
Piccardi (2011) pointed to the fact that is related to the definition of communities in the weak sense given by Radicchi2004defining with the criterion . If then this criterion is fulfilled. The strong definition of communities demands that each node has to have more internal than external links. If is small, both definitions tolerate communities which can be split. Consider, for example, a subgraph comprising two 4-cliques with one external link per clique and one link between both cliques. Both cliques are also communities in the strong and weak sense. Escape probability is used in cut based measures of separation as conductance and normalised cut. In Appendix, p. The Random Walker and Separation of Communities, we discuss these measures and also normalised node-cut, a measure of separation for link communities proposed by us recently Havemann et al. (2017).
Our discussion here is limited to topological networks. We transcend network topology if a suitable measure of node distance can be defined. Then global cohesion can be defined as some aggregate of distances between a subgraph’s nodes. As a suitable measure we consider a distance which is not maximal for unlinked nodes. If all unlinked nodes have the same distance the ends of a long chain would have the same distance as two nodes in the chain with a third node between them. Distance should also not strongly depend on the position of individual links which is the case for length of the shortest path and its derivatives.
In summary, maximising cohesion of communities in topological graphs is difficult when (A) maximising cohesion, (B) compromising between cohesion and separation, or (C) a size bias of cohesion measures may lead to the disregard of subgraphs that are important to solving specific real-world problems. In these cases, separation and cohesion can be measured for different objects. This can be achieved if we introduce the notion of cohesive community cores. Like whole networks, communities can have a core-periphery structure Yang and Leskovec (2014); Kojaku and Masuda (2017). A cohesive core can be linked to many peripheral nodes, which means that it is not well separated. Separation can be improved by including the core’s periphery into the community which simultaneously diminishes its internal cohesion. In order to realise separate measurements, we propose to consider communities in networks as well separated connected subgraphs and to reserve the feature of high cohesion for community cores. In other words, we propose to change the common notion of communities in networks in those cases where it is unnecessary but we rescue separation and cohesion as aims of optimisation. In the language of kannan_clusterings:_2004 we maximise the subgraph’s separation while the minimum condition for its cohesion is its connectedness. We then look for cohesive cores of communities. We propose to define a cohesive core and its periphery not in absolute terms but as a sequence of nested subgraphs with decreasing cohesion.
3 Core-Periphery Structures in Link Sets
There are several methods for finding cohesive cores of graphs or node communities Borgatti and Everett (2000); Zhang et al. (2015). We construct core-periphery structures in link communities as nested subgraphs with decreasing connectedness density (see App., p. Cohesion of Link Sets). We start from subgraphs with local maxima of local density of links which are sufficiently distant from other subgraphs with local maxima analogously to methods for node-community finding proposed by liu_novel_2017 and by wang_locating_2017. The simplest way to translate local node density used by these authors into the world of link clustering is to define local density of links as the number of neighbouring links attached to a node. Thus, local density of links equals local node density. Large stars as link sets with maximal connectedness density have then also a high local density of links. Therefore, we start from the largest star of a link community for constructing its core-periphery structures. We apply the same definition of local density but differ from wang_locating_2017 and liu_novel_2017, who construct disjoint clusters of nodes, by clustering stars as link sets and allowing for overlap in nodes and links.333Both groups use community centres as seeds for a local expansion. wang_locating_2017 apply a greedy algorithm maximising persistence probability which they assume to measure subgraph density (cf. discussion above). liu_novel_2017 propagate the labels of centres to nodes. For bipartite networks our algorithm can also be seen as an adaption of the method proposed by carmi_partition_2008 for partitioning a network into basins of attraction to hubs. Da Fontoura Costa (2004) also used high-degree nodes (hubs) as centres of communities which he constructed by a simple expansion process starting from a predefined number of hubs. Different from us, zhou_density_2017 directly translate local density from the world of node clustering to links to obtain a link-clustering method.
Our aim is to determine hierarchical core-periphery structures (named towns, for short) in a given connected subgraph induced by link set . We define a town as a hierarchy of stars where two stars are never indirectly connected with each other via smaller stars only. Two stars are directly connected if they share a link or one of their outer nodes. A star is connected to a town if it shares a link or a minimum number of outer nodes with the set of town stars of equal or larger size; otherwise it becomes the centre of an independent town. The minimum number of outer nodes is determined by a resolution parameter with which is used as a minimum threshold of relative overlap for a star to be attached to a town. If one common node of star and town is enough to unite both link sets. If more than half of the star’s outer nodes have to be already inside the town. If there are two or more towns a star is connected to it is split and its parts are attached to these towns.
The algorithm for finding cores and peripheries in link communities (CPLC algorithm) can be described as follows (cf. Algorithm 1). All star subgraphs of the community are ranked with regard to their size. To construct a town it is initialised by the largest star. The next star on the rank list is attached to town with node set if the number of the star’s outer nodes shared with the town fulfils the minimum condition given by resolution parameter : , where denotes the adjacent nodes of the star’s central node . A direct link between two star centres is also a sufficient condition for being included in the town. If a star could be united with two or more existing towns then we add to each town its links with this town. Its remaining links are united with all towns involved. Then we delete all (mostly small) stars from the list of candidates which now have no links outside any town. We skip candidate stars that share all links to towns with these towns. We found this feature useful in our experiments with link communities in a nearly bipartite network where different kinds of nodes can be centres of candidate stars. Skipping these stars does also work in the unipartite karate-club network.
The number of towns obtained depends on resolution . Instead of voluntarily setting parameter we explore its whole range. We start with minimal resolution and obtain a structure of the subgraph with the minimal number of towns. As long as we then recursively increase to a value at which it is possible to obtain at least one town more. Therefore, the next threshold is taken from the smallest portion of nodes shared by a star and a town with which the star was united. This guarantees that in the next run of CPLC this star and all stars with the same relative overlap to any of the towns are not united with these towns.
To select resolution levels at which relatively well separated towns are obtained we calculated normalised node-cut of link set for each town (cf. Equation 5 in App.; here is the link community analysed). Because towns are not optimised with regard to separation we also calculated function for subtracted by all links in overlaps between towns. For each town we choose the better (lower) of both values and evaluated the resolution level with the worst (largest) of any town. If there are two or more levels with same worst and same number of towns, we selected the lowest level with minimum link overlap between towns.
The karate club analysed by Zachary (1977) has only one town for lowest resolution level with node 34 as the centre. We obtain two towns for resolution with centres 1 and 34. For their link overlap reduces to three links: , , and . Besides the nodes of these three links, the two towns overlap in nodes 20 and 32. Thus, the two towns are compatible with the final splitting of the karate club due to conflicts between the two leaders 1 and 34. For the town with centre node 1, and for the other town. A better value (in fact the best we have found) can be obtained for a disjoint link splitting of the karate-club network where the towns’ link overlap is split up (s. Fig. 1).
Searching for link communities with locally minimal -values in a nearly bipartite network of 14,770 papers published 2010 in astronomy and astrophysics journals (including also geophysical papers) and their cited sources we found 127 overlapping link communities which cover the network and form a poly-hierarchy Havemann et al. (2017). All sources cited only once were omitted. Fig. 2 shows the small community which is an example of a well separated subgraph () but it is not very cohesive (it can be split into two subcommunities). Here we have two towns already at zero resolution. Increasing it to causes their overlap to decrease from 16 to 2 links ( and ) and of towns within the subgraph reaches a minimal value of 0.0187 if the two overlapping links are deleted from the town on the right-hand side. For we find solutions with four and more towns but relatively bad separation (worst ). The centres of the two towns are the (deep red) large stars with central nodes 15 and 31. The two towns correspond to two subcommunities we had found with our search algorithm. The ten papers in the town on the right-hand side of the subgraph deal with lightnings and similar electromagnetic phenomena in the atmosphere, the seven papers on the left-hand side mainly deal with effects of seismic activities measured in the ionosphere.
The number of stars and the number of towns both increase with the number of links . We therefore expect run-time of CPLC to scale with which is confirmed by experiments with 151 communities found in the astrophysical citation network (including the 127 communities mentioned above and the whole network with ). Due to space limitation we cannot present further statistics of results.
5 Summary and Discussion
If a real-world problem is likely to be represented by a hierarchy of communities or overlapping communities or by communities of varying and unknown sizes, or if it is difficult to determine the best compromise between cohesion and separation, it seems advantageous to separate the maximisation of cohesion and separation. In this paper, we propose a strategy that starts from communities as well separated subgraphs and identifies cohesive cores of such subgraphs. We applied this strategy to the analysis of link communities. To determine core-periphery structures as hierarchically nested subgraphs with decreasing cohesion in a link community we start from local maxima of local link density, i.e., from the largest stars. The examples presented here demonstrate that the algorithm we have tested is able to separate core-periphery structures (towns) if there are two or more such structures in a (sub)graph. Our next task is to evaluate each town with regard to their distinctness. If all stars have nearly the same size it would be difficult to speak of a hierarchy with centre and periphery. A further task is to assess the correspondence of core-periphery structures with the real-world problem of research topic detection. Towns of communities are expected to correspond to sub-topics of the larger topic represented by the community.
Beside high cohesion, another often assumed feature of cores is their network centrality (Csermely et al., 2013, p. 94), e.g., indicated by low average distance to peripheral nodes. We are also interested in non-central cores of link communities.
Towns of the whole graph can be used as seeds for local link clustering. Towns of communities found can recursively serve as seeds for finding smaller communities.
Cohesion of Link Sets
Let be the number of links in set attached to node , also called its internal degree. The number of links in a link is connected to equals
. For the total number of (ordered) pairs of directly connected links inwe find
In the sum each node occurs times with connections from one link to others. is an absolute measure for cohesion of link sets. It is not maximal if the links form a clique of nodes. Indeed, for a clique of four nodes connected by six links we have . If the six links form a star we obtain a higher value . In the star graph all links are directly connected but in cliques of at least four nodes not. This corresponds to the fact that the line graph of a star is a clique. If has the form of a star and denotes its central node then . For all other nodes we have , i.e., . Link sets are maximally connected if all their links are directly connected by a node, i.e., stars are maximally connected link sets.
We can define a relative measure of cohesion of link sets by relating absolute node connectedness of links to its maximum reached by stars. That means, as connectedness density of a link set we can define
This measure is useful for both one-mode and also for two-mode networks (where are no cliques). In both types of networks stars are the most cohesive link sets. Analogously to this measure, link similarity as defined by ahn2009link is not maximal for all link pairs in a clique of nodes but in a star of links.
The Random Walker and Separation of Communities
Supposing that a random walker is on any node in set (in an unweighted and undirected network), his probability to be on node equals He leaves in the next step with probability . Then his probability to leave from node is the product of both probabilities and the probability to leave from any node (escape probability) is .
Escape probability equals conductance of for with the number of all links Fortunato (2010). For cut is normalised by (with the set of all vertices) because subgraphs larger than half the whole graph tend to have smaller cuts . A smoother normalisation which takes this tendency into account is achieved in normalised cut defined by Shi and Malik (2000) as
In the case of link communities we have to cut not links but nodes to separate a link set from the rest of the graph. Normalised node-cut is a measure of separation of link communities derived from normalised cut by havemann_memetic_2017. It is given by
where runs through all nodes but for all nodes which are not attached to a link in . Set includes all edges and is the number of all nodes. Note, that and . Evans and Lambiotte (2009) introduced a random walker who jumps from a link to one of its nodes with probability 1/2 and then chooses one of the links attached to this node. The ratio equals the escape probability of such a link-node-link random walker: The probability of a link-node-link random walker to start at any link in set and to arrive on node is That means, his probability to leave from is and the escape probability is where and (cf. Eq. 5). 444This is a new derivation of function used by us for defining normalised node-cut which now appears as with and not 1, as stated by us. Both probabilities reach a maximum of 1 for a ring graph where each second link is in .
- Ahn et al. (2010) Ahn, Y.-Y., J. P. Bagrow, and S. Lehmann (2010). Link communities reveal multi-scale complexity in networks. Nature 466, 761–764.
- Amelio and Pizzuti (2014) Amelio, A. and C. Pizzuti (2014). Overlapping Community Discovery Methods: A Survey. Social Networks: Analysis and Case Studies, 105.
- Bagrow and Bollt (2005) Bagrow, J. P. and E. M. Bollt (2005). Local method for detecting communities. Physical Review E 72(4), 046108.
- Borgatti and Everett (2000) Borgatti, S. P. and M. G. Everett (2000). Models of core/periphery structures. Social Networks 21(4), 375–395.
- Carmi et al. (2008) Carmi, S., P. L. Krapivsky, and D. ben-Avraham (2008, December). Partition of networks into basins of attraction. Physical Review E 78(6), 066111.
- Csermely et al. (2013) Csermely, P., A. London, L.-Y. Wu, and B. Uzzi (2013). Structure and dynamics of core/periphery networks. Journal of Complex Networks 1(2), 93–123.
- Da Fontoura Costa (2004) Da Fontoura Costa, L. (2004). Hub-based community finding. arXiv 0405022.
- Evans and Lambiotte (2009) Evans, T. S. and R. Lambiotte (2009). Line graphs, link partitions, and overlapping communities. Physical Review E 80(1), 16105.
- Fortunato (2010) Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486, 75–174.
- Girvan and Newman (2002) Girvan, M. and M. E. J. Newman (2002). Community structure in social and biological networks. PNAS 99, 7821–7826.
- Havemann et al. (2017) Havemann, F., J. Gläser, and M. Heinz (2017). Memetic search for overlapping topics based on a local evaluation of link communities. Scientometrics, 1–30.
- Kannan et al. (2004) Kannan, R., S. Vempala, and A. Vetta (2004, May). On Clusterings: Good, Bad and Spectral. J. ACM 51(3), 497–515.
- Kojaku and Masuda (2017) Kojaku, S. and N. Masuda (2017). Finding multiple core-periphery pairs in networks. Physical Review E 96(5), 052313.
- Leskovec et al. (2010) Leskovec, J., K. J. Lang, and M. Mahoney (2010). Empirical comparison of algorithms for network community detection. In Proc. of the 19th international conference on World wide web, WWW ’10, New York, pp. 631–640.
- Liu et al. (2017) Liu, D., Y. Su, X. Li, and Z. Niu (2017). A Novel Community Detection Method Based on Cluster Density Peaks. In Natural Language Processing and Chinese Computing, Lecture Notes in Computer Science, pp. 515–525. Springer.
- Newman and Girvan (2004) Newman, M. E. J. and M. Girvan (2004). Finding and evaluating community structure in networks. Physical Review E 69, 026113.
Piccardi, C. (2011).
Finding and testing network communities by lumped markov chains.PloS one 6(11), e27028.
Pizzuti, C. (2009).
A Multi-objective Genetic Algorithm for Community Detection in
21st IEEE International Conference on Tools with Artificial Intelligence, pp. 379–386. IEEE.
- Radicchi et al. (2004) Radicchi, F., C. Castellano, F. Cecconi, V. Loreto, and D. Parisi (2004). Defining and identifying communities in networks. PNAS 101, 2658–2663.
- Ravasz and Barabási (2003) Ravasz, E. and A.-L. Barabási (2003, February). Hierarchical organization in complex networks. Physical Review E 67(2), 026112.
- Rezvani et al. (2018) Rezvani, M., Q. Wang, and W. Liang (2018). Fast Algorithm for Detecting Cohesive Hierarchies of Communities in Large Networks. pp. 486–494. ACM.
- Rossa et al. (2013) Rossa, F. D., F. Dercole, and C. Piccardi (2013, March). Profiling core-periphery network structure by random walkers. Scientific Reports 3, 1467.
- Schaeffer (2007) Schaeffer, S. E. (2007). Graph clustering. Computer Science Review 1(1), 27–64.
- Shi and Malik (2000) Shi, J. and J. Malik (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888 –905.
- Wang et al. (2017) Wang, X., G. Liu, J. Li, and J. P. Nees (2017, January). Locating Structural Centers: A Density-Based Clustering Method for Community Detection. PLOS ONE 12(1), e0169355.
- Xie et al. (2013) Xie, J., S. Kelley, and B. K. Szymanski (2013, August). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. 45(4), 43:1–43:35.
- Xu et al. (2007) Xu, X., N. Yuruk, Z. Feng, and T. A. J. Schweiger (2007). SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD, KDD ’07, New York, NY, USA, pp. 824–833. ACM.
- Yang and Leskovec (2013) Yang, J. and J. Leskovec (2013). Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42(1), 181–213.
- Yang and Leskovec (2014) Yang, J. and J. Leskovec (2014). Overlapping communities explain core-periphery organization of networks. Proc. of the IEEE 102(12), 1892–1902.
- Zachary (1977) Zachary, W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33(4), 452–473.
- Zhang et al. (2015) Zhang, X., T. Martin, and M. E. J. Newman (2015, March). Identification of core-periphery structure in networks. Physical Review E 91(3), 032803.
- Zhou et al. (2017) Zhou, X., Y. Liu, J. Wang, and C. Li (2017). A density based link clustering algorithm for overlapping community detection in networks. Physica A: Statistical Mechanics and its Applications 486, 65–78.