Dense and sparse vertex connectivity in networks

06/10/2020 ∙ by Djellabi Mehdi, et al. ∙ CNRS IRIT Université Toulouse 1 Capitole 0

The different approaches developed to analyze the structure of complex networks have generated a large number of studies. In the field of social networks at least, studies mainly address the detection and analysis of communities. In this paper, we challenge these approaches and focus on nodes that have meaningful local interactions able to identify the internal organization of communities or the way communities are assembled. We propose an algorithm, ItRich, to identify this type of nodes, based on the decomposition of a graph into successive, less and less dense, layers. Our method is tested on synthetic and real data sets and meshes well with other methods such as community detection or k-core decomposition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The work initiated in the 2000s, which led to the creation of a network science, has shown that many large real-world networks share similar structural properties. The major two are the “small world” phenomenon [71] and the power-law distributions of a number of key measures [2, 18, 26, 57, 31]. Many other shared patterns have been highlighted and for more in-depth reviewing the readers can refer to [45, 3, 16, 46]. The first generic network models, produced mainly before the 2000s (for example, Erdös-Rényi networks or regular grids to name just two extreme examples) are not really adjusted to these large networks because they are largely homogeneous, whereas real world networks are highly heterogeneous. A lot of research has thus been carried out to upgrade them or to design new ones. In that context, precursor works on the configuration model [43, 7], allowing for random choices of a network with a prescribed distribution of degrees, are back in the forefront [20]. Another model proposed by Watts and Strogatz [72] is based on a random rewiring of a regular network. Many other models have been published, and some of them are based on the concept of evolving network [5, 55].

Nowadays, however, it is widely agreed that these shared topological properties are not sufficient to explain the diverse and complex architectures of real world networks. In various fields of application, the existence of a modular functional organization is observed or reasonably assumed [76, 65, 32, 39, 30], and assumptions are made about the existence of links between the topological structure of a network and the function of the system. Concerning complex networks as a whole, a new idea has emerged, that of a natural division of networks into modules, each of which presenting a relative structural-functional homogeneity and whose integration is carried out at the level of the entire network with a relative complexity [64, 22, 66]. To support these views, theoretical results have shown that structural parameters of the modules may account for some network properties [44]. In addition, the dynamic interconnection of the modules can sometimes largely explain the global dynamics of the system and is therefore an important aid in understanding and modelling complex networks [17]. Hence, the concept of community has gradually gained momentum due to the widespread acceptance that a community is a subset of nodes dense in connections with respect to the remaining part of the network, even if some other definitions have been suggested [54, 70]. Automatic methods for searching communities in networks have been a very active field of research in the last few years and are often based on the optimization of combinatorial or statistical criteria ([27, 19, 37, 54]

to name a few). Depending on the calculation methods, communities may overlap or not and the capacity to evaluate the quality of a community organization generated by an algorithm is a real challenge. The current evaluation processes can be classified into two categories: those relying on artificial networks and those relying on well-known real world networks. Artificial networks contain ad hoc built structural modules

[23, 35, 33] and algorithms are tested on their ability to identify these modules. Concerning real world networks, the situation is more complex because the observed modularity is often both functional and structural. Recent works focus on the bias that may result from the confusion between metadata and ground truth for community detection algorithms [25]. There is neither universal ground truth nor universal community detection algorithm [59] so that the validation of communities detection algorithms thanks to metadata must ensure that the information conveyed by the metadata is well represented by structural configurations [52].

The recurring observations showing that nodes with similar functions have a higher chance to be linked to one another than random pairs of nodes, have given rise to a number of valuable studies whose basic principle, inspired from statistical methods, is to consider as relevant those configurations that deviate most from a random null model to be defined. The modularity measure [49, 48], that has received a considerable amount of attention, is based on the above-stated comparison which considers the configuration model as a null model. Other examples can be found in [36, 6]. The choice of the null model is both very flexible and crucial. Such flexibility made it clear that very different configurations could be valid and reinforced the existence of a complexity that remains to be explored. The way networks are built impacts the type of null model to be considered, e.g technological and social networks will have to be treated differently [50, 13]. Besides, one should revisit the fact that a network can simply be partitioned into homogeneous communities and specific methodologies should be developed in order to bring heterogeneity into and between structural communities, by taking into account, for instance, sparser parts of the network or the multi-scale aspects of complex networks. These points have been already addressed in several recent papers. Some works suggest to differentiate the types of nodes in a given community according to their position on the community: either “centered” or “relay” with other communities [61, 29, 69]. The sets of nodes overlapping several communities have been studied more detainedly, leading to changes in the definition of a community [54]. Whereas one could intuitively expect that these sets are less dense in connections, [75] shows the opposite. With more and more data available at different scales, the measures we have on the networks should help to integrate these different levels from local to global. In [28], the authors modify the betweenness centrality measure so as to have two different terms, one depending on the entire network (standard part of the betweenness) and the other one only depending on the immediate neighborhood of the node. Along the same lines, [11, 74] modify the expression of Newman’s modularity in order to take onboard the neighborhood configurations and thus solve the limit resolution issue. The concept of modularity is also adapted to hierarchical patterns which play a crucial part in the organization of large systems [41, 64, 56, 55, 34, 42],

A network without a clear modular organization (for example because many nodes are artificially attached to communities), may present a “core/periphery” structure [8, 14, 58, 24]. That type of meso-scale configuration has been less studied than community structures but is complementary [73] and somehow more general if one takes into account several core models. A core was originally defined in [8] as a set of nodes that are both largely connected to each other and to peripheral nodes, the latter being poorly connected with one another. Several cores may exist, a core may not be in a central position but may occupy a central position for one part of the network. Let us note that a rich club [12, 78] is slightly different from a core and refers to the fact that nodes of higher degree (hubs) are more densely connected than smaller degree nodes. A rich club may exist even if the remaining network does not have the properties inherent to a periphery.

Our approach is part of the current trend which explores the new facets of topological network structures. Starting from some studies which have shown that between a core and a periphery [67], between a rich club and the rest of the network [9], there may well be meaningful configurations of nodes and links worth exploring, we have chosen to focus on nodes with relatively few connections in their neighborhoods. Our considerations are inspired by Burt’s work [10], an expert in network sociology who, starting from the notion of “structural hole”, establishes bridges between personal networks on the one hand and complete networks on the other hand. So, we define a local weight measurement that takes into account both the degree of a node, and the properties of the edges attached to it. A multi-scale approach is obtained by successive deletions of sets of highly weighted nodes. These sets are kind of rich-clubs [78] with respect to this weight measurement. The existence or not of such a rich-club is decided in comparison to a null model to be defined. The nodes subsequent to all the successive deletions of rich clubs are neither in the cores of communities nor in their overlaps. Such nodes are hardly highlighted by betweenness centrality analyses, although they can be regarded as a kind of backbone supporting the architecture of the network. The method implemented in this study is to be assessed on artificial networks and some well-known real networks. The paper is organized as follows. Section 2 provides a quick overview of the different topological measures that can be attached to an edge and to a node and introduces the measure that will be used. The distributions of on some standard networks will be presented. Then, in Section 3, the rich-club idea of [78] will be adapted to a weighted network with topological weights, and the null model relative to will be defined. Section 4 will describe the iterative rich-club extraction algorithm that can be used to assign to each node a measure of the quality of its belonging to the sparse part. Section 5 will summarize the results of the method on synthetic and standard real network models. Concluding remarks in Section 6 will address some general perspectives.

2 Topological edge weights, and the measure

2.1 Some notations

The networks are finite, undirected and without any loop or multiple edges. We designate by a network with a node set and an edge set . By default, will stand for the number of nodes of . We write when is adjacent to , i.e . Please note that as is not looped, we always have . The (open) neighborhood of in is . For any node , we denote by . If confusion can be discarded, we note the previous quantities by for the degree and for the neighborhood.
Let us remind that the adjacency matrix of is defined by if and otherwise. A clique of a network is a maximal complete subnetwork of , in which all the possible edges do exist.

2.2 Definition of

In line with the latest methods of network analysis, the method explored in this paper is a multiscale method aimed at representing the network as an arrangement of different local configurations. In order to obtain a finer description of the network than its mere division into communities (in the standard sense of densely connected subsets of nodes), we first consider nodes that are in the “core” of the communities. This means that most of the edges incident to such a node remain within the community. We consider that an edge is within a community if both the degrees and the relative number of common neighbors of its extremities are relatively high. Hence, we define the weight of a couple of nodes and by:

where is a normalization factor, is if and otherwise, and

is the Dice-Czekanowski-Sørensen [15] similarity index. The measure is then equal to the product of the degrees weighted by a topological overlap measurement.

Local configurations are densely connected clusters whose possible internal heterogeneity will not be touched upon in this paper since several methods are already available. We will however focus on understanding the structure of the overall arrangement of these configurations. To that end, each node will be weighted by an indicator of its local topology called “topological strength” and the arrangement will be identified on a weighted network (cf. Section 4).

We define the strength of a node by the sum of the weights of its adjacent edges:

(1)

By convention, if is an isolated node that is . It is easy to show that

(2)

where

is the harmonic mean of

and . Let us note that, within the Equation 1, the harmonic mean makes the contribution of an edge between a node of low degree and a node of high degree low. Moreover, the weight of an edge is in and is equal to when nodes and have no common neighbors and if, and only if, and are neighbors with all other nodes of the graph, ie. .

Hence, if, and only if, every edge incident to has a weight equal to which is equivalent to the case where the graph is a clique . In the general case . If we consider the mean edge weight , the mean degree
and the mean node weight , we have .

2.3 Distribution of the values for some selected networks

Graph metrology has produced many measurement indices that apply to nodes and some indices applying to edges [47, 53]. The degree centrality and the clustering coefficient measures are particularly essential in network analysis and closely related to the definition of . With a view to illustrate the difference between these two measures and , we compare their distributions for two different types of networks: a network produced by a Watts and Strogatz random model [72] and a network produced by a stochastic bloc-model (i.e. SBM model) [23].

(a) Heat scale for the Watts and Strogatz model.
(b) Heat scale for the block stochastic model.
(c) Adjacency matrix for the Watts and Strogatz model.
(d) Adjacency matrix for the block stochastic model.
Figure 1: Adjacency matrices and heat scales representing the distributions of , degree and clustering coefficients for: Watts-Strogatz model (a,c) and Stochastic Block-Model (b,d).

The SBM model (or planted partition model) used in this study consists in 15 blocks connected by a tree-like architecture. Each block contains 100 nodes with an internal probability

, whereas the nodes of the “neighboring” blocks are connected with a probability (Figure 0(d)). The model is constructed so that the architecture connecting the blocks together is of rooted tree type, all its leaves being at a distance of 3 from the root. The root is of degree 2 and all other internal nodes have a degree 3, with parameters and , as can be seen in Figure 0(d). Each block is characterized by given average values of , degree and clustering coefficient. It is thus possible to use these average values to differentiate the various types of blocks. For example, in Figure 0(b), each distribution of the degree and of the clustering coefficient has 3 modes, which corresponds to 3 types of blocks: the root block, the internal blocks and the leaves blocks. Yet, 4 modes for the distribution can be observed: two corresponding to the root block and to the leaves blocks and two corresponding to the internal blocks, one type of block being linked to the root (two blocks) and one type being linked to the leaves (four blocks). This distinction shows that the nodes inside the internal blocks have on average the same degree and clustering coefficient, but a different , depending on their neighboring blocks. This is due to the fact that the measure takes into account the degree of each block (hereby, we mean by degree of a block the number of other blocks in which the probability that a node in the block is linked to a node in the block is not equal to ) as well as the degree of its neighboring blocks, while the average clustering coefficient within a given block is just expressed as a function of its degree. The formal expressions of the measure, and the clustering coefficient are stated in Appendix.

The Watts-Strogatz model is made of 500 nodes designed from a regular ring lattice on which each node is linked to the 10 nearest neighbors and with a rewiring probability of the edges (Figure 0(c)). Contrary to the stochastic block model, the Watts-Strogatz model makes it more difficult to identify the different types of nodes although the distribution evidences two prevailing modes as well as several other modes of lesser magnitude, that cannot be traced when observing the distribution of the clustering coefficient. As for the degree distribution, it does not contain enough information, because all the nodes have a degree that is distributed around the initial value, namely

, with a slight variance due to randomization.

For both random models, Figure 1 shows that the distribution of is different from the other two distributions and may contain more or different information.

3 Weight in the context of the rich club phenomenon

3.1 Rich club phenomenon and existing null models

The rich club phenomenon refers to a topological effect in complex networks, which can be defined as the tendency of high degree nodes (rich nodes) to be interconnected, thus forming a a rich club subset which was first formalized in a quantitative form in [78]. An algorithm for automatic detection was then proposed in [12].

The rich club effect is quantified by the following parameter :

(3)

where is the number of edges among the nodes having degree higher than . Thus, is the edge density of the subnetwork induced by nodes with degrees greater than . For a network with the above-mentioned rich club properties the curve may have a peak around a given value of . In order to avoid the bias due to the fact that high degrees are more likely to be linked together, [12] proposed a normalized version of :

(4)

where is the rich club coefficient of a null model. The rich club is then obtained for the value of such that is significantly higher than , i.e. when is maximum. For unweighted networks, the null model preserves the degree distribution and breaks the degree-degree correlations between neighboring nodes. The algorithm presented in [12] is widely used in the case of undirected unweighted networks, and consists in choosing the configuration model as a null model, by preserving the degree sequence of the original network, while randomly rewiring the endpoints of each edge. In the case of weighted networks, however, there is no consensus on the rich club coefficient nor on the choice of a null model, even if [4] proposes a unified framework. Indeed, the degree distribution is no longer the only quantity that should be kept in the null model. Ideally one should preserve the distribution of weights, and the distribution of strengths, which is the sum of the weights of all the edges attached to a node. Finding an algorithm that provides a null model meeting all these requirements is not an easy task.

In [79], the authors construct a null model with a given strength distribution, but change both the weight and the degree sequences. In [63], the rich club coefficient is calculated by replacing the degree of a node with the sum of the weights of its incident edges and by normalizing the result using one of the functions listed in [4]. The authors of [51] propose a randomized directed network as a null model by replacing each non-directed edge with two directed edges, one in each direction, and by reshuffling the endpoints of the outgoing edges of each node among its neighbors. Although this null model preserves the degree, weight and strength sequences, its main limitation is that it does not break the degree-degree correlations. This becomes a major issue when the strength is correlated with the degree because, in such a case, the null model will also have a strength-strength correlation. Another null model for weighted networks is proposed in [62]. It generates a network with degree and strength distributions that converge towards some given distributions when the network is large enough. We have made empirical observations on real world networks suggesting that this size limitation has a greater impact when the studied networks are characterized by heavy tailed degree and/or strength distributions.

3.2 An adapted “null model” for weighted networks

Given a network , we consider the weighted network with adjacency matrix . So, is an auxiliary network with weights equal to , in such a way that the sum of the weights of the edges attached to a node is equal to .

It is a priority for our study to choose a null model that does not contain any - or degree-degree correlations between neighboring nodes, in order to ensure that the rich nodes (i.e. high nodes) are linked together only by chance. But this model must also preserve the degree and the weight distributions. The null model applied in this study has been extensively studied in [38], and is provided by the following algorithm:

  1. Starting from , we first modify the topology by randomly rewiring its links, in order to break the degree-degree correlations, and thus obtain a configurational model of that we call , which no longer contains degree-degree correlations.

  2. Then, the weights of the network are randomly redistributed on the edges of which yields a null model, noted , that has neither degree-degree nor - correlations.

By construction, this null model preserves the degree and the weight distributions, and therefore the average value of , but does not preserve the distribution of . This involves a modification in the computation of the rich club coefficient, that is detailed in the next section.

4 An iterative extraction algorithm

4.1 A new approach for weighted rich clubs

In this section we present an algorithm for the extraction of the high weighted subsets of nodes, using a weighted rich club approach.

Since the null model does not preserve the sequence computed on the original network, we need to filter out the nodes following a new strategy, different from the one that consists in selecting the nodes whose strength is larger than a certain value .

So, given a network , let us denote by the set consisting of nodes 111If several nodes have the same , we randomly choose the order in which they are added. In practice, this situation is very unlikely and the choice order has a negligible impact in large real networks. of highest

and define the weighted rich club coefficient in the following way:

(5)

Comparing this weighted rich club coefficient to that obtained from the null model (by analogy with the case of unweighted networks) this definition of ensures that the two quantities are calculated from networks with the same number of nodes , as it is the case in Equation 3.

Now we define our weighted rich club parameter as

(6)

where is the weighted rich club coefficient computed from the null model. We then define a high weighted subset of nodes in , with respect to the definition of density, as the set of nodes that maximizes the weighted rich club parameter:

(7)

Compared to the classical method for unweighted networks, note that we consider the difference instead of the ratio . This choice is justified when we have and , and at the same time a high value for the ratio . This case occurs for small values of and is encountered for networks whose auxiliary model contains a strong - correlation among neighboring nodes, while having a heavy tailed distribution. This is due to the presence of a small number of nodes that have values of much higher than the average, and that are also linked to one another, as emphasized by the - correlations. The consequence of such a configuration is both a rapid growth of when is small and a slow growth of due to the randomization process. As a consequence, the ratio reaches its maximum for small values of , which results in a small weighted rich club, sometimes containing only two nodes and one edge, which is not very relevant to us222 Another justification for this choice is practical in regard to the iterative algorithm we use, which takes significantly less calculation time when we evaluate the difference than the ratio.

4.2 The shortcomings of a single execution step

Let us highlight the fact that a single execution of the above algorithm is not sufficient to provide a full information about the overall structure of a network. So, we introduce a toy model of network made of several blocks of the same size, each block being an Erdös-Rényi network with different parameters. Then we connect the nodes of the tree to the blocks with a certain probability.

The parameters of this model are:

  • : the number of blocks in the network

  • : the number of nodes inside each block , ,

  • : the probability to create a link between any two nodes inside the same block ,

  • : the number of nodes contained in the random tree,

  • the probability to connect any node of the tree to any node of the block .

This generates a network with nodes. We first generate a random tree as the spanning tree of a Erdös-Rényi random network whose number of nodes is . Then, we generate the blocks independently and connect each node of the tree to each node of the block with the probability .

To ensure that the toy network generated in this way is connected, it is sufficient to choose so that the average number of links connecting the tree to the blocks is very large compared to 1. It is easy to verify that and then if .

In the following, we give an example and a graphical representation (using a spring force positioning algorithm) of the described toy network, with blocks of nodes in each block, a tree of nodes, and internal probabilities: . Here we take , which fulfills the connectivity prerequisite.

Once the toy network has been generated, the procedure described in the previous section is applied; the results are presented in Figure 2. The maximum of is reached for and the corresponding nodes are those of the cluster with the probability . This example shows that it is necessary to iterate the algorithm more than once otherwise nodes that are in other clusters may be counted in the sparse part.

(a)
(b)
Figure 2: (a) The evolution of the weighted rich club parameters and the weighted rich club coefficient for both the toy network and its corresponding null model (see Section 4.2 for definitions of the toy network model). (b) The result of a single iteration algorithm. The nodes detected as the dense subset are in blue and represent only one of four dense parts.

4.3 Iterative algorithm and quality measure

We repeat the above-described process iteratively, while deleting at each iteration , the weighted rich club calculated using Equation 7 at iteration , and keeping the same weights on the remaining links after deletion. This makes it possible to extract the high weighted subsets of nodes one by one, and this process is repeated until the stopping criterion of the algorithm is reached or there is no more weight in the remaining network. Once these iterations are completed, we obtain a series of weighted rich clubs and the element can be evaluated using the difference between the weighted rich club coefficient on the studied network and its null model, normalized so that it is always lower than 1:

(8)

where is the weighted network obtained after deleting the first weighted rich clubs from and is the number of nodes of . Let us remind that the weights of are not recalculated but inherited from .

This measure represents the average value of over all possible series of nodes of selected in decreasing order of . The measure generally goes as follows: decreasing from its maximum value, which is the quality of the first extracted weighted rich club, to lower values referring to the qualities of weighted rich clubs with lower ranks. This decrease is valid for networks having high weighted subsets of nodes, the measure however, may otherwise exhibit unpredictable behavior 333For example, if the network does not have a particular structure as in the case of an Erdös-Rényi network. In the example of the toy model described in Section 4.2, the quality measure falls to (almost) zero once all its clusters have been extracted, because not only will the weight of the links that remain in the network (after cluster extraction) be low, but there will also be no more clusters left in the network. This entails the decrease of values and of the quality measure.

The measure is used in order to accept or reject the weighted rich club to be part of the dense part: it is accepted if, and only if, . So, is a resolution parameter of the dense/sparse parts that has to be chosen. The higher the threshold, the smaller the dense part and the larger the sparse part. It can be constant during the whole process or recalculated at each step of the algorithm. The best choice of varies according to the type of data being studied but, in practice, a value proportional to is often a good choice. In the following, for the real networks we study, we set as a default value. If we have any a priori information on the structure of the dense (or sparse) part, we can choose a more appropriate threshold to make the method more efficient. For example, for the artificial networks we consider below, the dense part has a modular organization that we can more efficiently retrieve using a variable threshold based on an Erdös-Rényi model.

The whole process is summarized in an algorithm named ItRich for iterative weighted rich clubs (Figure 3).

Figure 3: ItRich algorithm

The complexity of ItRich is proportional to the number of times the main loop is executed. It is not necessary to repeat the computation of the loop until the remaining network has a null weight, if the value of is set beforehand (ex: ). Note that the higher the threshold, the shorter the computation time. The main loop contains 4 computations whose first two can easily be made in parallel. These are the most expensive. The first one consists in calculating a null model. This is bounded by the calculation of the configuration model, which can be obtained in with

the number of links in the network. In practice, it is better to calculate a number of null models and estimate the results based on their averages (as it is made in

Section 5.1 and Section 5.2). The second one consists in sorting the list of nodes in decreasing order of (it takes ), then, in parallel, constructing the sub-networks induced by for and evaluating their weights ( for each sub-network). If the decomposition stops after the finding of rich clubs and , then ItRich has order of time complexity. Let us note that the computing of the null model is the expensive part of the algorithm.

5 Results and discussion

5.1 Artificial networks

We first apply our algorithm to a set of synthetic networks partly used in [36]. We measure the output of our algorithm ItRich by calculating the values of Recall and Specificity (also called True Positive rate and True Negative rate respectively). Finally, we compare these performances with those obtained by OSLOM [36], an algorithm widely used in this field.

For these data, we choose a variable threshold equal to the quality measure obtained on a Erdös-Rényi random network with the same number of vertices as in the remaining network at step of the algorithm ItRich (i.e. the initial graph where the first rich clubs have been removed) and a probability parameter . This value of ensures that the Erdös-Rényi network and the network evaluated at the iteration have the same value of (see Appendix). As previously noted, the advantage of using a Erdós-Rényi network is, on the one hand, its lack of modular structure, which ensures that it does not contain any subsets whose density of links is significantly higher compared to the rest of the network. On the other hand it is related to the synthetic data studied in this paper. As seen below, the construction of the data is such that the subnetwork induced by the sparse part is close to a Erdós-Rényi network.

Experimental data

Our synthetic model is based on a Lancichinetti-Fortunato-Radicchi (LFR) benchmark [35] to which nodes have been added as detailed below. Let us remind that in a LFR benchmark network each node is assigned to a community with a control on some characteristics, such as the exponent of the power-law distribution of the degrees, the exponent for the power-law distribution of the community size, and the proportion of links a node shares with nodes outside its own community.

Given a LFR, we add nodes so as to create two classes: a first one, called the “dense part”, composed of nodes from the original LFR model, and a second one, the “sparse part”, containing the nodes that have been added. These added nodes will be connected so that their weights are lower than those of the LFR network. This a priori classification is then used as a ground truth to evaluate the efficiency of our algorithm in separating the “dense part” from the “sparse part”.

In order to connect the added nodes (the noise) to the original LFR network, we use the method of [36]: the degree of a new added vertex is drawn from a distribution that is the same as that of the LFR network, and the vertex is connected to the network by preferential attachment. Figure 3(a) shows the ordered values of the logarithm of , obtained from an LFR model of 1000 nodes (, , ) to which 1000 other nodes have been added.

We can observe that the average value of restricted to the nodes of the sparse part (under the red dashed line) is much lower than that restricted to the dense part (above the black dashed line), making the classification quite simple. Indeed Figure 3(a) shows that there is a gap between the minimum value of in the LFR network and the maximum value of in the set of the added nodes. To make the classification more challenging, we randomly add links between the added nodes, which has the effect of reducing the value of the gap. The new gap is given as a function of the initial gap and a parameter . When no links are added, and when the initial gap is entirely filled. Figure 3(b) gives the example of a total gap reduction.

In Figure 4, we can observe that the part of the curve above the red dashed line is the same before and after filling the gap, whereas this is not the case for the noise nodes (lower part of the curve), meaning that the added links change the shape of the distribution of the nodes in the sparse part, rather than only shifting it. This effect is due to the fact that links are added between randomly selected pairs of nodes, and not by especially targeting those with high values, which produces a data set for which the separation between the dense and the sparse part is less obvious to determine. One can point out that the larger the value of , the more severely the distribution is impacted when increases.

(a)
(b)
Figure 4: The logarithm of of each node vs. its rank. The red dashed lines represent the maximum value of in the noise nodes, whereas the black dashed lines represent the minimum value of in the original LFR (). (a) When , there is a gap in the values of separating the nodes of the initial LFR from the noise nodes. (b) When , this gap is bridged by adding links between added nodes, so that the minimum value of in the initial LFR network is also equal to the maximum value of in the sparse part.

Experimental protocol

Three experiments are performed, each with a different value of the mixing parameter . For each experiment, we perform sets of calculations, each set for a different value of , from to . For each of the values of , the initial LFR networks have nodes, and . We execute our algorithm for ranging from to , in 10 knot increments. The algorithm is thus applied 100 times for each value of . Each calculation is moreover based on an average of 50 different null models.

Let be the set of nodes from the initial LFR model, the set of added nodes, (resp. ) the set of nodes detected as the dense (resp. sparse) part by the algorithm, we use the standard following metrics to evaluate the performance of the results:

  • Recall (or True positive rate):

  • Specificity (or True negative rate):

The average values of and over the networks generated for the different values of are calculated for the three experiments. The results are then compared with those of the OSLOM algorithm [36]. OSLOM is one of the few algorithms that rely on the statistical properties of clusters, while bringing an « added value » to standard community detection algorithms, i.e the possibility to have a set of “homeless” nodes assigned to no community whatsoever. We will consider these nodes as noise, and compare them to the nodes of the sparse part obtained by ItRich.

Results
Only the results for and and are plotted (Figure 5) and discussed here and more complete graphics can be found in Appendix 2.

(a) ItRich
(b) OSLOM
(c) ItRich
(d) OSLOM
Figure 5: Values of ItRich and OSLOM performance measurements against the number of added nodes. For all cases , . When (resp. ) the gap between the values of for the nodes of the LFR network and the added nodes is the highest (resp. null).

We confirm the results of [36] that OSLOM correctly separates the clusters and the noise as long as the noise is not too significant. ItRich is much more efficient to separate the LFR from the noise, even when the noise has a high density of links ().

When , ItRich provides a Specificity equal to and that remains constant as we increase the number of added nodes. This means that all these nodes have been correctly classified in the sparse part, regardless of their number which ranges from to

. Except for a few outliers, this is also the case when

and the number of added nodes is less than half the number of the LFR nodes. On the contrary, in the OSLOM algorithm, the value of Specificity drops significantly and continuously after about nodes have been added, no matter whether or . This suggests that the larger the number of added nodes, the more often some of them are assigned to a community by OSLOM, resulting in a lower ratio of added nodes correctly assigned to the sparse part.

As for the Recall , its values calculated by ItRich are high but less than from the beginning to the end of the calculation, even when there are no added nodes at all and whatever or . This is due to the fact that there are some nodes within the LFR network that are initially classified as sparse, and that remains so when noise is added. In comparison, OSLOM also keeps a constant value of with an average of

%, and a standard deviation of 1%. This shows that this algorithm assigns a community to almost all nodes of the LFR network, even when

.

Finally, two phenomena may be noted. The first is for OSLOM when and around 200 nodes have been added. There is a threshold effect that causes a jump in the values of the Specificity, from values lower than 0.2 to values around 0.8. This is explained by the fact that as long as the number of added nodes is small enough, OSLOM incorrectly groups them into a single community. The second is for ItRich when and around 500 nodes have been added. The Specificity seems to take random values between 0 and 1 and the Recall between 0.85 and 0.95. We have no clear explanation for this.

Concerning the experiments for and and 9 other values of between and (see Appendix 2), the best results of ItRich (for these three values of ) are obtained for , and this can be accounted for by the fact that they correspond to the case in which the perturbations are the lowest. Indeed, by increasing the mixing parameter , we decrease the average value of in the initial LFR network, which decreases the value of the gap and allows it to be filled without adding too many random links (i.e. the variance between the networks generated from to decreases with ). We generally observe better results on average for ItRich than for OSLOM, especially regarding the Specificity measure, which has a minimum average value of for ItRich when , . It means that, in the worst case, our algorithm manages to identify on average at least of the “noise” nodes. On the other hand, the Recall is better for OSLOM, but it often means misclassifying the added nodes in the dense part with the nodes of the initial LFR.

5.2 Real world networks

We apply ItRich to three data sets that are widely studied in the field of network science: Lusseau’s bottlenose dolphins [40, 39], American political blogs [1] and American College football [21]. The first one has only 62 nodes and can thus be analyzed finely. The second one is a bigger network with 1490 nodes. Finally, the third one has all of its 115 nodes in a 8-shell except one 444All the datasets have been downloaded from https://www.cc.gatech.edu/dimacs10/archive/clustering.shtml. For all these networks, we choose the default value .

(a) Dolphins
(b) College football
(c) Political blogs
Figure 6: The values of for the three studied networks. The red dotted line represents the value of . For each network, the null model is computed 100 times and the error bars represent the standard deviations of the quality measure.

5.2.1 Lusseau’s bottlenose dolphins

The dolphin network is constructed from observations of a community of 62 bottlenose dolphins over a period of 7 years. Nodes in the network represent the dolphins, and ties between nodes represent associations between dolphin pairs occurring more often than expected by chance.

In Figure 6(b), green, blue and purple nodes correspond respectively to the first, second and third weighted rich clubs identified by our algorithm. These results, compared to the community structure proposed in [39] (Figures 6(b) and 6(a)), illustrate the difference between ItRich and a community detection algorithm. Please note that ItRich may put the nodes of different communities in the same weighted rich club.

We observe that the first weighted rich club is composed of highly connected nodes. The second and third weighted rich clubs are composed of several linked vertices mostly belonging to the neighbourhood of the first weighted rich club. These observations invite us to compare the distribution of the vertices within the weighted rich clubs and the distribution of the vertices within the shells [60]. There are different shells in the dolphin network Figure 6(a)). The sparse part contains the periphery which is equal to the 2-core. Symmetrically, the dense part is included in the union of the 3- shell and the 4- shell.

Among the vertices of the network, are in the sparse part. These vertices are of two different types: those that have a null value of and those whose value of sharply decreases at each iteration of ItRich so that it remains lower than the minimal value required to be in a weighted rich club. These second-type vertices are Ripplefluke, MN60, SN100, TSN103, DN16, Shmuddel, Haecksel, Thumper, Bumper (Figure 6(a)). The first among them get a null value of as soon as the first weighted rich club is removed. Vertices TSN103, SN100 and Haecksel are in the 4-shell. Vertex TSN103 has neighbors, each of them of high degree but weakly linked together. In fact, TSN103 is at the intersection of two sets of vertices of two different communities, and so, has a very particular intermediary position. Vertex SN100 is the vertex with the highest betweenness in the network and the smallest non-zero clustering. It is also in a central position between distinct groups of vertices and has a relatively high degree. Lastly, Haecksel has a value that gradually decreases after removing the first and the second weighted rich clubs. It is largely linked to vertices of the first weighted rich club, which, when removed after the first iteration of ItRich, leads to a configuration in which Haecksel has a null clustering. So, within the 4-shell, the position of Haecksel is somewhat special.

This first real world network study suggests that, when several k-cores exist, there may be a high overlap between the sparse (resp. dense) part and the k-cores for low (resp. high) values of . The differences between both decompositions are worth looking at closely. Moreover, the vertices of the sparse part of a network can be divided into two categories: those with a low value that are, from the beginning, at the periphery of the network and the others. The latter show medium values of but are linked to high- vertices. These vertices seem to occupy a rather specific position in the organisation of the core of the network. Such an information is new compared to the one obtained from the analyses conducted so far, and it could lead to more in-depth interpretations by biologists.

(a)
(b)
Figure 7: Lusseau’s bottlenose dolphins (a) The shell decomposition of the network. (b) The decomposition obtained from ItRich, the width of each edge is proportional to its weight . The layouts are obtained with a force-based algorithm.

5.2.2 American political blogs

This dataset is composed of a set of 1490 nodes representing American blogs discussing political issues. Each blog has been labeled by Lada Adamic [1] as either liberal (758) or conservative (732) and two blogs are linked if at least one refers to the other. The authors concluded that most links are within the two separate communities, with far fewer cross-links between them. Another interesting pattern was that conservative bloggers were more likely to link to other blogs (other conservative blogs but also liberal ones).

For , the dense part has three rich clubs which account of the nodes and of the links of the total graph. The distribution of nodes and links in the various rich clubs and the sparse part is given in Table 1. The three Rich clubs have comparable numbers of nodes and, within each rich club, the density of links between the liberal nodes on the one hand and between the conservative nodes on the other are almost equal. However, the 1st Rich club has a higher proportion of liberals, while the reverse is true for the other two rich clubs and for the sparse part.

liberal conservative total
Rich club 1 113/3052  (48.2) 84/1682  (48.2) 197/5260  (27.2)
Rich club 2 80/283  (9.0) 134/710  (8.0) 214/1083  (4.8)
Rich club 3 50/42  (3.4) 96/116  (2.5) 146/165  (1.6)
Sparse part 515/84 (0.06) 418/50 (0.05) 933/152 (0.03)
Total 758/7302 (2.5) 732/7841 (2.9) 1490/16718/ (1.5)
Table 1: Distribution of nodes and links within the American political blogs network: is for nodes, links and the density of links of the induced subgraph (ie. ). The links are not directed, that is, one link corresponds to an arc from a node to a node , or from to , or to a pair of arcs with opposite directions between and .

As in the previous data set, we observe a correlation between the -core and the values of , with an overlap of the values for the dense and the sparse parts (Figure 7(a)). However, the large number of -cores (from the 15-core to the 34-core) that contains nodes of both the dense part and the sparse part does not make the -core decomposition efficient to allow for the differentiation of the weighted rich clubs from the sparse part of the network.

If we pay particular attention to the nodes whose is in the overlap of the -values covered by both the dense and the sparse parts, Figure 7(b) confirms that the -values of the neighbors of a node () is of particular importance to differentiate the two types of nodes of the overlap: for a same -shell, the nodes of the sparse part have neighbours with average values higher than those of nodes of the dense part. As many of these neighbours belong to one weighted rich club, the values of these vertices partially or totally collapse after removing some rich-clubs and this explains why, in fine, they are categorised in the sparse part of the network.

However, let us note that the knowledge of and , without using ItRich, is not sufficient to provide any information to characterize the weighted rich clubs or the sparse part, or even to find their overlap (Figure 7(c)).

(a)
(b)
(c)
Figure 8: American political blogs (a) Plot of -core vs. with nodes of the dense (resp. sparse) part in blue (resp. red). (b,c) The average of calculated on the neighbourhood of the nodes vs. their value on a logarithmic scale. For a given , only the mean (bold curve) and standard deviation are given. In (b), the nodes of the dense and sparse parts are distinguished, which is not the case in (c). The overlapping area between dense and sparse parts is shown by vertical lines.

5.2.3 American College Football

We will now examine a network representing the confrontations between different American college football teams during the 2000 season [21]. The nodes represent the participating teams and a link connects two teams which played against each other during the season. Each team is labeled by a conference that contains from 8 to 12 teams. For most conferences, internal matches are more frequent than external matches, giving the network a modular structure. We identify, however, two properties that make this data set particularly suitable for testing our algorithm. The first is that all the nodes are part of the , except one which is in the . This implies that the results cannot be found by a core decomposition. The second property is that there are 5 teams that are not part of any conference, which have been given the label “independent”. In [21] it is also stated that the seven teams in the Sunbelt conference played almost as many games against teams in the Western Athletic conference as against teams in their own conference. They also played a large portion of their inter-conference games against teams from the Mid-American conference.

Figure  9 shows the rich club of each team and the conference it belongs to.

Figure 9: American college football. In the decomposition obtained from ItRich, each node is represented by a different marker representing its conference, along with a different color according to its classification by Itrich. The width of each edge is proportional to its weight . The layout is obtained with a force-based algorithm.

ItRich reveals four rich clubs, the first two being composed of respectively 58 and 42 nodes, and the last two being smaller with respectively 6 and 4 nodes. Five nodes remain in the sparse part.

It can be noted that out of the 4 rich clubs, the first two mainly contain teams that play the majority of their games against teams from their own conference, while the last two contain teams which tend to diversify their opponents’ conferences (eg. the Sun Belt conference). All but one of the teams of the first rich club are those correctly classified in [21] in the sense that the composition of the community it belongs to is exactly the composition of the conference it belongs to (Texas Christian is in the first rich club but misclassified in [21]). The last rich clubs contain teams that play a large number of inter-conference games, including teams from the Sun Belt conference, and some teams from the Western Athletic conference, which are, according to [21], teams whose conference does not really form a community, in the sense that there are few intra-conference confrontations. We also notice that the third and the fourth rich clubs are equal to some communities found by Girvan and Newman’s algorithm. This is explained by the fact that they both induce small cliques in the network (with 6 and 4 nodes respectively).

To quantify the information carried by the links between teams and conferences, we use an empirical measure based on the Shannon’s entropy. Let be the degree of the node , and the set of the 11 conferences in which the teams play, plus the set of independent teams. For each node we call the ratio between the number of neighbours of the node playing in the conference and the total number of neighbours of . We have

(9)

This measure is zero if all the neighbours of play in the same conference, and has a maximum value of reached for a node when all its neighbours are equally distributed across the conferences. Figure 10 plots versus .
The first two rich clubs are characterized by teams with low values of and high values of . It reflects the fact that these teams mainly play intra-conference matches. On the contrary, teams of the last two rich clubs and those in the sparse part mainly play matches against teams of varied profiles. The sparse part is composed of 5 nodes, 4 of them (Navy, Central Florida, Notre Dame and Connecticut) are independent teams, and have not been classified among any community in [21]. The sparse part covers all the independent teams, except the Utah State team which is classified in the 3rd rich club, as it belongs to a clique.

The only node in the sparse part that does not share these properties is the Miami Florida team (from the Big East conference), whose is high and is low. This node is in the situation described above, namely that, despite its high weight, it does not reach that of its neighbors, who are almost all in the first rich club. Remember that it is in the 8-core.

Figure 10: The quantities of information of each node versus its value. The nodes are distinguished by their colours according to the weighted rich club they belong to.

6 Conclusion

This paper proposes a new viewpoint on the network structure analysis in order to provide both an alternative and an additional approach to standard methods, such as the kcore decomposition, or the community detection methods. We defined a new density measure for each vertex, called , taking into account both the degree of the vertex and of its neighbors and the ratio of common neighbors between the vertex and its neighbors. We used this measure in the particular context of weighted rich clubs to develop an algorithm capable of providing several hierarchical layers of nodes, which altogether constitute what we call the “dense part”. The set of vertices that are not in any of the layers is called the “sparse part”. Experiments on both synthetic and real networks show that the dense part largely intersects with the -core for a high enough . The sparse part, on the other hand, contains peripheral nodes but also nodes of the core that have a special position in the core configuration and that are not screened by other methods. These vertices have properties different from those of their neighbors while not being on the periphery and often remaining within the core of the network. So, they constitute a kind of backbone of the network that meshes well with the partition into communities or into -cores which helps to better understand the topological organization of the network. How to connect these nodes with vertices representatives of their community and with specific peripheral nodes is one of the themes of our future studies. The time complexity of our algorithm ItRich can be reduced to using parallel computations.

7 Appendix

7.1 Appendix 1

The purpose of this appendix is to give formal expressions of and clustering coefficient for the planted partition model (PPM) with a tree-like architecture used in Section 2.3. The PPM we consider is a random network defined by:

  • a set of vertices partitioned into disjoint subsets of vertices each;

  • a matrix where is the adjacency matrix of a tree with nodes, and .

In such a model, the s are called communities or blocks. Matrix is the adjacency matrix of the tree linking the blocks and we say that two blocks and are adjacent if . Two nodes within the same block have a probability to share one edge, whereas this probability is between nodes of different but adjacent blocks and otherwise. Since nodes of the same block share the same properties on average, in the following we refer to average values. For example we note the average degree of a node in the bloc , and the number of blocks adjacent to the block . We have:

The local clustering coefficient of a node is the ratio between the number of edges in its neighborhood and the number of pairs of vertices in its neighborhood. It is easy to see that the number of pairs is for . In order to evaluate the number of edges in the neighborhood of , we consider disjoint cases:

  • and : their number is

  • and : their number is

  • and : their number is

Since is the adjacency matrix of a tree, the three vertices cannot be located in three distinct blocks. It follows:

(10)

To calculate , observe that

Remember that , where is the harmonic mean of the degrees and . It is easy to verify that .

Let :

  • if : there are common neighbors of and in and common neighbors of and not in . Moreover, as , we have . So, given , we have:

    by noticing that .

  • if : there are common neighbors of and that are in or and there cannot be common other neighbors (ie. neither in nor in ) since is the adjacency matrix of a tree.

To summarize,

(11)

7.2 Appendix 2

The following plots show the results of ItRich and OSLOM for different values of the mixing parameter and of the gap parameter . See Section 5.1 for details on the experimental protocol.

Each of the 11 points of the curves is an average over 100 iterations of the algorithm (ItRich or OSLOM) with a varying number of added nodes. The experiment is carried for and .

(a)
(b)
(c)
(d)
(e)
(f)

References

  • [1] Adamic, L. A. & Glance, N. (2005) The political blogosphere and the 2004 U.S. election: divided they blog. pp. 36–43.
  • [2] Albert, R. Jeong, H. & Barabási, A. L. (1999) Internet: Diameter of the World-Wide Web. Nature, 401, 130–131.
  • [3] Albert, R. & Barabási, A. L. (2002) Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.
  • [4] Alstott, J., Panzarasa, P., Rubinov, M., Bullmore, E. & Végrtes, P. E. (2015) A unifying framework for measuring weighted rich clubs by integrating randomized controls. Scientific Reports, 4, 7258.
  • [5] Barabási, A. L. & Albert, R. (1999) Emergence of scaling in random networks. Science, 286, 509–512.
  • [6] Bianconi, G., Pin, P. & Marsili, M. (2009) Assessing the relevance of node features for network structure. Proceedings of the National Academy of Sciences, 106, 11433–11438.
  • [7] Bollobás, B. (1980) A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb., 1, 311–316.
  • [8] Borgatti, S. P. & Everett, M. G. (2000) Models of core/periphery structures. Social Networks, 21(4), 375 – 395.
  • [9] Boulet, R., Jouve, B., Rossi, F. & Villa, N. (2008) Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7), 1257–1273.
  • [10] Burt, R. S. (1992) Structural Holes: The Social Structure of Competition. Cambridge: Harvard University Press.
  • [11] Chen, S., Wang, Z. Z., Tang, L., Tang, Y., Gao, Y., Li, H. J., Xiang, J. & Zhang, Y. (2018) Global vs. local modularity for network community detection. Plos One, 13, 1–21.
  • [12] Colizza, V., Flammini, A., Serrano, M. A. & Vespignani, A. (2006) Detecting rich-club ordering in complex networks. Nature Physics, 2, 110–115.
  • [13] Crespelle, C. (2017) Structures of complex networks and of their dynamics. Habilitation à Diriger des Recherches, University of Lyon 1.
  • [14] Csermely, P., London, A., Wu, L. Y. & Uzzi, B. (2013) Structure and dynamics of core/periphery networks. Journal of Complex Networks, 1, 93–123.
  • [15] Dice, L. R. (1945) Measures of the amount of ecologic association between species. Ecology, 26, 297–302.
  • [16] Dorogovtsev, S. N. & Mendes, J. F. F. (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford: Oxford University Press.
  • [17] Estrada, E. (2011) The Structure of Complex Networks: Theory and Applications. Oxford University Press.
  • [18] Faloutsos, M., Faloutsos, P. & Faloutsos, C. (1999) On power-law relationships of the Internet topology. SIGCOMM Comput. Commun. Rev., 29(4), 251–262.
  • [19] Fortunato, S. & Hric, D. (2016) Community detection in networks: A user guide. Physics Reports, 659, 1–44.
  • [20] Fosdick, B. K., Larremore, D. B., Nishimura, J. & Ugander, J. (2016) Configuring random graph models with fixed degree sequences. SIAM Review, 60, 315–355.
  • [21] Girvan, M. & Newman, M. E. J. (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
  • [22] Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. (1999) From molecular to modular cell biology. Nature, 402, 47–52.
  • [23] Holland, P. W., Laskey, K. B. & Leinhardt, S. (1983) Stochastic blockmodels: First steps. Social Networks, 5(2), 109 – 137.
  • [24] Holme, P. (2005) Core-periphery organization of complex networks. Physical review. E, Statistical, nonlinear, and soft matter physics, 72, 046111.
  • [25] Hric, D., Darst, R. K. & Fortunato, S. (2014) Community detection in networks: Structural communities versus ground truth. Physical review. E, Statistical, nonlinear, and soft matter physics, 90, 1–19.
  • [26] Huberman, B. A. & Adamic, L. A. (1999) Growth dynamics of the World-Wide Web. Nature, 399, 131.
  • [27] Javed, A. M., Younis, M. S., Latif, S. Qadir, J. & Baig, A. (2018) Community detection in networks: A multidisciplinary review. Journal of Network and Computer Applications, 108, 87 – 111.
  • [28] Jensen, P., Morini, M., Karsai, M., Venturini, T., Vespignani, A., Jacomy, M., Cointet, J. P., Mercklé, P. & Fleury, E. (2015) Detecting global bridges in networks. Journal of Complex Networks, 4(3), 319 – 329.
  • [29] Kelley, S., Goldberg, M., Magdon-Ismail, M., Mertsalov, K. & Wallace, A. (2012) Defining and discovering communities in social networks. in Handbook of Optimization in Complex Networks: Theory and Applications, pp. 139–168. Boston: Springer US.
  • [30] Krause, A. E., Frank, K. A., Mason, D. M., Ulanowicz, R. E. & Taylor, W. W. (2003) Compartments revealed in food-web structure. Nature, pp. 282–285.
  • [31] Kumar, R., Novak, J. & Tomkins, A. (2006) Structure and evolution of online social networks. in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 611–617. New York: ACM.
  • [32] Kumar, R., Raghavan, P., Rajagopalan, S. & Tomkins, A. (1999) Trawling the Web for emerging cyber-communities. Comput. Netw., 31(11-16), 1481–1493.
  • [33] Lancichinetti, A. & Fortunato, S. (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80, 016118.
  • [34] Lancichinetti, A., Fortunato, S. & Kertész, J. (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys., 11(3), 033015.
  • [35] Lancichinetti, A., Fortunato, S. & Radicchi, F. (2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78, 046110.
  • [36] Lancichinetti, A., Radicchi, F., Ramasco, J. & Fortunato, S. (2011) Finding Statistically Significant Communities in Networks. Plos One, 6, e18961.
  • [37] Leskovec, J., Lang, K. J. & Mahoney, M. (2010) Empirical comparison of algorithms for network community detection. in Proceedings of the 19th international conference on World wide web, pp. 631–640. New York: ACM.
  • [38] Liu, B., Xu, S., Li, T., Xiao, J. & ke X. (2018) Quantifying the Effects of Topology and Weight for Link Prediction in Weighted Complex Networks. Entropy, 20, 363.
  • [39] Lusseau, D. & Newman, M. E. J. (2004) Identifying the role that animals play in their social networks. Proceedings of the Royal Society B: Biological Sciences, 271, S477–S481.
  • [40] Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E. & Dawson, S. M. (2003) The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54, 396–405.
  • [41] Meunier, D., Lambiotte, R. & Bullmore, E. T. (2010) Modular and Hierarchically Modular Organization of Brain Networks. Frontiers in Neuroscience, 4, 200.
  • [42] Mones, E., Vicsek, L. & Vicsek, T. (2012) Hierarchy Measure for Complex Networks. Plos One, 7, e33799.
  • [43] Moreno, J. L. & Jennings, H. H. (1938) Statistics of Social Configurations. Sociometry, 1, 342–374.
  • [44] Moriya, S., Yamamoto, H., A. H., Hirano-Iwata, A., Kubota, S. & Sato, S. (2019) Mean-field analysis of directed modular networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(1), 013142.
  • [45] Newman, M., Barabasi, A. L. & Watts, D. J. (2006) The Structure and Dynamics of Networks. Princeton University Press.
  • [46] Newman, M. E. J. (2003) The structure and function of complex networks. SIAM Review, 45, 167–256.
  • [47] Newman, M. E. J. (2010) Networks: An Introduction. Oxford University Press.
  • [48] Newman, M. E. J. (2016) Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E, 94, 052315.
  • [49] Newman, M. E. J. & Girvan, M. (2004) Finding and evaluating community structure in networks. Phys. Rev. E, 69, 026113.
  • [50] Newman, M. E. J. & Park, J. (2003) Why social networks are different from other types of networks. Phys. rev. E, 68(3), 8.
  • [51] Opsahl, T., Colizza, V., Panzarasa, P. & Ramasco, J. J. (2008) Prominence and Control: The Weighted Rich-Club Effect. Phys. Rev. Lett., 101, 168702.
  • [52] Peel, L., Larremore, D. B. & Clauset, A. (2017) The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.
  • [53] Qian, Y., Li, Y., Zhang, M., Ma, G. & Lu, F. (2017) Quantifying edge significance on maintaining global connectivity. Scientific Reports, 7, 45380.
  • [54] Radicchi, F., Castellano, C., Cecconi, F., Loreto, V. & Parisi, D. (2004) Defining and identifying communities in networks. PNAS, 101, 2658–2663.
  • [55] Ravasz, E. & Barabási, A. L. (2003) Hierarchical organization in complex networks. Phys. Rev. E, 67, 026112.
  • [56] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A. L. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, 1551.
  • [57] Redner, S. (1998) How popular is your paper? An empirical study of the citation distribution. European Physical Journal B, 4, 131–134.
  • [58] Rombach, M. P., Porter, M. A., Fowler, J. H. & Mucha, P. J. (2014) Core-Periphery Structure in Networks. SIAM Journal on Applied Mathematics, 74, 167–190.
  • [59] Schaub, M. T., Delvenne, J. C., Rosvall, M. & Lambiotte, R. (2017) The many facets of community detection in complex networks. Applied Network Science, 2.
  • [60] Seidman, S. B. (1983) Network structure and minimum degree. Social Networks, 5, 269 – 287.
  • [61] Seifi, M., Junier, I., Rouquier, J. B., Iskrov, S. & Guillaume, J. L. (2013) Stable community cores in complex networks. in Complex Networks, pp. 87–98. Berlin: Springer.
  • [62] Serrano, M., Ná, M. B. & Satorras, R. P. (2006) Correlations in weighted networks. Physical Review E, 74, 055101.
  • [63] Serrano, M. A. (2008) Rich-club vs rich-multipolarization phenomena in weighted networks. Phys. Rev. E, 78, 026101.
  • [64] Simon, H. (1962) The Architecture of Complexity. Proc. of the American Philosophical Society, 106, 467–482.
  • [65] Spirin, V. & Mirny, L. A. (2003) Protein complexes and functional modules in molecular networks. PNAS, 100, 12123–12128.
  • [66] Swindale, N. V. (1990) Is the cerebral cortex modular?. Trends in Neurosciences, 13, 487 – 492.
  • [67] Verma, T., Araújo, N. A. M. & Herrmann, H. J. (2014) Revealing the structure of the world airline network. Scientific Reports, 4, 5638.
  • [68] Vijaymeena, M. K. & Kavitha, K. (2016) A Survey on Similarity Measures in Text Mining. Machine Learning and Applications: An International Journal, 3, 19–28.
  • [69] Wang, Y., Di, Z. & Fan, Y. (2011) Identifying and Characterizing Nodes Important to Community Structure Using the Spectrum of the Graph. Plos One, 6(11), e27418.
  • [70] Wasserman, S. & Faust, K. (1995) Social network analysis: methods and applications. Cambridge: University Press.
  • [71] Watts, D. (1999) Small worlds: The dynamics of networks between order and randomness. Princeton: Princeton University Press.
  • [72] Watts, D. J. & Strogatz, S. H. (1998) Collective dynamics of “small-world” networks. Nature, 393, 440–442.
  • [73] Xiang, B. B., Bao, Z. K., Ma, C., Zhang, X., Chen, H. S. & Zhang, H. F. (2018) A unified method of detecting core-periphery structure and community structure in networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(1), 013122.
  • [74] Xiang, J. & Hu, K. (2012) Limitation of multi-resolution methods in community detection. Physica A: Statistical Mechanics and its Applications, 391, 4995–5003.
  • [75] Yang, J. & Leskovec, J. (2012) Community-Affiliation Graph Model for Overlapping Network Community Detection. in ICDM, ed. by M. J. Zaki, A. Siebes, J. X. Yu, B. Goethals, G. I. Webb, & X. Wu, pp. 1170–1175. IEEE Computer Society.
  • [76] Young, M. P. (1992) Objective analysis of the topological organization of the primate cortical visual system. Nature, 358, 152–155.
  • [77] Zachary, W. W. (1977) An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452–473.
  • [78] Zhou, S. & Mondragòn, R. J. (2004) The rich-club phenomenon in the Internet topology. IEEE Communications Letters, 8, 180–182.
  • [79] Zlatic, V., Bianconi, G., Diaz-Guilera, A., Garlaschelli, D., Rao, F. & Caldarelli, G. (2009) On the rich-club effect in dense and weighted networks. The European Physical Journal B, 67, 271–275.