Detecting Local Community Structures in Social Networks Using Concept Interestingness

02/05/2019
by   Mohamed-Hamza Ibrahim, et al.
UQO
0

One key challenge in Social Network Analysis is to design an efficient and accurate community detection procedure as a means to discover intrinsic structures and extract relevant information. In this paper, we introduce a novel strategy called (COIN), which exploits COncept INterestingness measures to detect communities based on the concept lattice construction of the network. Thus, unlike off-the-shelf community detection algorithms, COIN leverages relevant conceptual characteristics inherited from Formal Concept Analysis to discover substantial local structures. On the first stage of COIN, we extract the formal concepts that capture all the cliques and bridges in the social network. On the second stage, we use the stability index to remove noisy bridges between communities and then percolate relevant adjacent cliques. Our experiments on several real-world social networks show that COIN can quickly detect communities more accurately than existing prominent algorithms such as Edge betweenness, Fast greedy modularity, and Infomap.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

08/31/2017

Social Network Analysis Using Coordination Games

Communities typically capture homophily as people of the same community ...
07/06/2020

Community detection and Social Network analysis based on the Italian wars of the 15th century

In this contribution we study social network modelling by using human in...
09/07/2021

Identifying Influential Nodes in Two-mode Data Networks using Formal Concept Analysis

Identifying important actors (or nodes) in a two-mode network often rema...
04/26/2016

Evaluating the effect of topic consideration in identifying communities of rating-based social networks

Finding meaningful communities in social network has attracted the atten...
11/26/2017

BL-ECD: Broad Learning based Enterprise Community Detection via Hierarchical Structure Fusion

Employees in companies can be divided into di erent communities, and tho...
05/13/2021

Community evolution in retweet networks

Communities in social networks often reflect close social ties between t...
06/26/2014

Overlapping Community Detection Optimization and Nash Equilibrium

Community detection using both graphs and social networks is the focus o...

1 Introduction

Given a social network , where the node set includes the objects in the social network, and the edge set denotes the relationship between objects. The problem is to detect all possible communities by dividing the network into groups of nodes (i.e., meaningful connected components) based on the hidden relevant relationships in .

The discovery of cohesive groups, cliques and communities inside a network is one of the most studied topics in social network analysis. It has attracted many researchers in sociology, biology, computer science, physics, criminology, and so on. Community detection [1, 2] aims at finding clusters as sub-graphs within a given network. A community is then a cluster where many edges connect nodes of the same group and few edges link nodes of different groups. For instance, a community in the social network LinkedIn may represent members with a similar professional profile.

Community detection algorithms can be mainly categorized into two groups [1]

: (i) Agglomerative procedures, in which nodes/groups are iteratively merged if they are similar, and (ii) divisive algorithms, in which clusters are iteratively decomposed by cutting the edges between less similar vertices. A finer categorization of community detection algorithms includes the following main kinds: hierarchical clustering, modularity maximization, clique (and variants such as

-clique and -plex) identification, block-modeling, and spectral graph partitioning [3, 4, 1].

In this paper, we leverage Formal Concept Analysis (FCA) and the stability index of identical concepts to find relevant cliques and irrelevant bridges. Formal Concept Analysis is a mathematical formalism for data analysis [5] that uses a formal context as input to construct a set of formal concepts organized in a concept lattice. It has been successfully used in several areas of computer science to discover patterns such as homogeneous groups or association rules. In [6], Freeman was the first to use FCA for community detection in one-mode data social networks. His FCA-based method starts with an adjacency matrix where objects are individuals and attributes are maximal cliques of a size at least equal to , constructs the concept lattice, identifies and then eliminates special cliques and edges to finally get the communities. Indeed, this opens the door to a promising research area of using cliques to detect communities in the social network. In this context, several clique-based methods have been introduced in the literature, including clique percolation methods [7, 8, 9]. More recently, Hao et. al. [10] define a new method that identifies -equiconcepts to further generate -cliques in social networks. Our COIN method can be seen as akin to these clique-based methods. But a basic aspect of COIN is the use of Formal Concept Analysis theory to better understand the network topology and the exploitation of concept interestingness measure such as stability index to discover communities by identifying relevant and irrelevant parts of the social network.

The paper is organized as follows. Section 2 gives a background about social network analysis and FCA while Section 3 describes our method for community detection in one-mode data networks using FCA and interestingness measures. In Section 4 we provide an empirical of our method against existing ones. We finally conclude the paper and describe further work in Section 5.

2 Background

This section will briefly review the main concepts that support the comprehension of our COIN community detection method by using an illsutrative example, which is an excerpt of a LinkedIn connection network and contains members of the LARIM team at University of Quebec in Outaouais. As shown in Figure 1, the network is modeled as an undirected graph , where is a set of nodes representing members, and is a set of edges where an edge connects two members, , if they have a link on LinkedIn. Let us now express our basic notation.

Figure 1: Undirected graph as an excerpt of the LinkedIn network.

2.1 Basic Notation and Definitions

Definition 1 (Clique).

Let be an undirected graph defined over the objects . A clique of size in is a subset such that for any two nodes (i.e., objects) and , there exists an edge (i.e., a binary relation) .

In the sequel, we will express a clique by a set of nodes without reference to the edges. For instance, the set represents a clique of size .

Definition 2 (Maximal clique).

A clique is maximal if it cannot be extended to include one more adjacent object node.

For example, is a maximal clique of size .

Definition 3 (Isolated maximal clique).

A maximal clique is isolated if and we have . That is, there is no edge that connects an object in the maximum clique to any object outside it.

For example, represents an isolated maximal clique in Figure 1.

Definition 4 (Bridge or Cut-edge).

An edge is a bridge iff it is not contained in any cycle and its removal increases the number of connected components in the graph .

Definition 5 (Non-trivial Bridge).

A Bridge is non-trivial iff its end vertices and have a degree (i.e., number of neighbors) greater than .

For example, the edge is a non-trivial bridge since the end vertices and have a degree equal to and , respectively.

2.2 Formal Concept Analysis

In the following we recall key notions of FCA that will be used in this paper.

Definition 6 (Formal context).

It is a triple , where is a set of objects, a set of attributes, and a binary relation between and with . For and holds (i.e., ) iff the object has the attribute , and otherwise (i.e., ).

Given arbitrary subsets and , the following derivation operators are defined:

where is the set of attributes common to all objects of and is the set of objects sharing all attributes from . The closure operator implies the double application of , which is extensive, idempotent and monotone. The subsets and are closed when , and .

Definition 7 (Formal concept).

The pair is called a formal concept of with extent and intent if both and are closed and , and .

For a finite intent (or extent) set of elements, we use to denote its power set with a number of subsets equal to , i.e., the set of all its subsets, including the empty set and the set itself.

Definition 8 (Partial order relation ).

A concept if:

(1)

In this case, is called a superconcept (or upper neighbor or successor) of , and is called a subconcept (or lower neighbor or predecessor) of . The set of all concepts of the formal context is expressed by or simply .

Definition 9 (Concept Lattice).

The concept lattice of a formal context , denoted by , is a Hasse graphical diagram that represents all formal concepts together with the partial order that holds between them. In , each node represents a concept with its extent and its intent while the edges represent the partial order between concepts.

There are several methods (cf. [5, 11, 12, 13, 14]) that build the lattice, i.e., compute all the concepts together with the partial order.

One-mode data networks contain only one type of nodes and relations. Hence, we can simply adapt the formal context (in Definition 6) to define a one-mode data context as follows.

Definition 10 (One-mode formal context).

It is a formal context in which the two sets of objects and attributes are identical, i.e., , and is a set of relations defined on with . For , holds iff object is linked to or .

2.3 Concept interestingness

Interestingness (quality) measures of a formal concept are commonly used to assess its relevancy. While several interestingness measures have been introduced to select relevant concepts [15, 16, 17, 18], the stability index of , has been found to be the most prominent for selecting relevant concepts [19].

Definition 11 (Stability Index).

Let be a formal context and a formal concept of . The intensional stability can be computed as [16, 20, 15]:

(2)

In Equation (2), intensional stability measures the strength of dependency between the intent and the objects of the extent

. More precisely, it expresses the probability to maintain

closed when a subset of noisy objects in are deleted with equal probability. In fact, this measure quantifies the amount of noise in the extent and overfitting in the intent . The numerator of in Eq. (2) can be computed exactly by identifying and counting the minimal generators of the concept [17, 21]. Such computation takes a time complexity of [17, 21], where is the size of the concept lattice, and requires the lattice construction which needs a time complexity of [22, 23]. However, we have recently designed an efficient method to approximate the stability index using the low-discrepancy sampling (LDS) approach [24]. Taking a set

of uniformly distributed samples from the intent powerset of a concept

, the LDS method needs a time complexity of

to estimate the stability index of

with a convergence rate (i.e., sampling error) equals to .

3 COIN for Community Detection

At a conceptual level, our overall COIN method consists of the following key elements. First, we build the formal context and construct the concept lattice of the social network. Second, we extract from the lattice the whole set of concepts that represent all the cliques and bridges in the social graph. Third, we use a concept interestingness measure to identify noisy bridges and relevant cliques. Finally, we remove the most noisy bridges and percolate (merge) the adjacent relevant cliques to detect communities.

3.1 Building a Formal Context for a Social Network

In COIN, the first task is to build the one-mode formal context of the social network by computing the symmetrical modified adjacency matrix [10] as follows:

(3)

In Eq. (3), we assign to the element of in the row and column if the object (node) is not connected to the object in the graph . Otherwise, we assign to it. Note that the diagonal elements are assigned the value .

For example, the constructed formal context of our example is represented in Table 1.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
5 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
6 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0
7 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0
9 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
10 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0
11 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0
12 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Table 1: The formal context for the network of Figure 1.

We then build the concept lattice of our example as shown in Figure 2.

Figure 2: The concept lattice .

3.2 Identifying Interesting cliques and Irrelevant bridges

With a little more analysis of the constructed lattice , it is possible to identify the concepts in which the intent is equal to the extent.

Definition 12 (identical concept).

A formal concept , with extent and intent , is called an identical concept if , i.e., its extent and intent are identical.

We use to denote the set of all the identical concepts.

Proposition 1.

Given a social graph and its corresponding concept lattice , an identical concept with and , actually represents a -clique in .

Proof.

Since an identical concept is a maximal square in the formal context, it represents a unit square matrix of size - as a sub-matrix of the modified adjacency matrix - and hence a -clique. Suppose now that is a -clique of with . Then, from Definition 1, for any two object nodes in , there exists an edge in that connects the two objects. Using Eq. (3), the constructed modified adjacency matrix that expresses the clique clearly defines a matrix consisting of all 1s. Such a matrix coincides with the identical concept in which both extent and intent contain only the object nodes of . This implies that a -clique whose node set is is equivalent to an identical concept such that . ∎

For example, the identical concept of the concept lattice in Figure 2 captures the 3-clique with nodes of the network in Figure 1.

Therefore, from Proposition 1, we can extract the cliques of the network by identifying their corresponding identical concepts in . For instance, the identical concepts appear in ’yellow’ in Figure 2, and represent all the cliques and bridges as shown in Figure 3-1. Now, the role of concept interestingness comes into play. That is, we can measure how much noise exists in a clique by computing the stability index of its corresponding identical concept. The noise of a clique indicates how its objects are cohesive to each other and separable from other objects in the graph. Now, the question is how clique cohesion (looking for tightly cohesive groups) and separation (seeking for highly separated cliques) are inherited from the noise of their corresponding identical concepts. At a high level, this could be illustrated as follows. First, the stability captures the noise of an identical concept by estimating how its objects depend on the removal of each individual object. In fact, this measure of noisiness quantifies the connectivity among these objects in the corresponding clique. That is, much noise in the clique means that many objects are not cohesive and need to be removed to disconnect the clique. Second, the stability estimates the specificity of the object-object links of the identical concept with respect to the one-mode formal context. Thus, it assesses how these objects, in the corresponding clique, are influenced by the ties that hold between each individual object and other objects in the graph. That is, it approximately quantifies how these objects are strongly connected with other objects outside the clique. We call a clique relevant if it contains a very small amount of noise, i.e., its objects have a high cohesion and and a low separability.

Consequently, the stability value of an identical concept approximates the probability that its corresponding clique is a portion of a potential community. For instance, if an identical concept has the highest stability value, then the involved objects of its corresponding clique are highly cohesive and completely separable from other objects in the graph. This, in fact, renders such a clique an isolated maximum one that likely forms a stand-alone community. On the contrary, an identical concept of size , which has a low stability value, could identify a very noisy 2-clique (and hence, a non-trivial bridge in the graph) that is probably not a part of any potential community.

Proposition 2.

Given a social graph and its corresponding concept lattice , an identical concept with and , represents a corresponding isolated maximum k-clique in where has the highest value of the stability index:

(4)
Proof.

The proposition is held once we prove that: (1) is represented by an identical concept; (2) This identical concept has the highest stability index.

(1) Suppose that is an isolated maximum clique of size in . Since every “isolated maximum” -clique has all the properties of a -clique, then, from Proposition 1, it can be easily demonstrated that has a modified adjacency matrix that defines a all-ones matrix, and therefore is equivalent to an identical concept such that .

(2) From Definitions 2 and 3 of a maximal and isolated clique, we know that there is no edge that connects any object inside to any other object outside . Thus, the all-ones matrix of defines a sub-matrix of the whole one-mode formal context , in which all elements in that define the relations among the objects outside , are zeros. From the definition of unit matrix that defines , we have:

(5)

That is, except the empty set, all the other elements of the powerset satisfy the stability condition in the numerator of Eq. (2). This implies that the numerator of Eq. (2), in the stability of , is equal to the size of the powerset after excluding only the empty set. Thus, we have:

(6)

This implies that the stability of the identical concept, with , is equal to , and hence increases with the size of its corresponding -clique. ∎

For example, the identical concept in Figure 2 captures the isolated maximum 3-clique shown in Figure 1, and has its highest stability value .

Proposition 3.

Given a social graph and its corresponding concept lattice , an identical concept , with , represents a corresponding non-trivial bridge in , and has the following stability index:

(7)
Proof.

The proposition is held once we prove that: (1) a bridge is represented by an identical concept with an extent and intent involving only the two objects of the bridge; (2) This identical concept has a stability value of .

(1) Let be a non-trivial bridge between two components and of such that and . From Eq. (3), the modified adjacency matrix of defines a unit matrix . Now, since each object of the bridge belongs to a different component, then its matrix is also a sub-matrix of the whole one-mode formal context such that we have the following two properties:

  1. and

  2. and

The modified adjacency matrix of the bridge can be used to extract, from , an identical concept where both its intent and extent contain the two objects (nodes) of the bridge.

(2) is the powerset of the identical concept . Based on the definition and properties of the modified adjacency matrix of the bridge, only one subset satisfies the stability condition in the numerator of Eq. (2), while the other subsets, i.e., do not. Thus, the numerator of Eq. (2) contains only one subset. This implies that the stability of the identical concept is equal to . ∎

For example, in Figure 2, the identical concept has a stability and captures the non-trivial bridge in the graph of Figure 1. Object belongs to while object is an element of .

Figure 3: How COIN algorithm works on the illustrative example. (1) Extracting the identical concepts that represent cliques and bridges. (2) Using the approximated stability index of identical concepts to cut noisy bridges, e.g., , and detect isolated maximum cliques, e.g., . (3) Percolating the remaining relevant cliques, e.g., and to get the final predicted communities.

3.3 COIN Algorithm For Detecting Communities

Input: Set of all identical concepts in the concept lattice .

Output: Set of all communities in network .

1:
2:// Extract isolated cliques and remove bridges
3:for each concept  do
4:     
5:     if  then
6:// is an isolated maximum clique
7:          // as a community
8:         
9:     end if
10:     if  then
11:// Cut , which is a noisy non-trivial bridge
12:         
13:     end if
14:end for
15:// Percolate adjacent relevant cliques
16:for  do
17:     
18:     if  then
19:         
20:         
21:         
22:     end if
23:end for
24:
25:return
Algorithm 1 The COIN algorithm for detecting communities.

Algorithm 1 gives the pseudo-code for the COIN community detection algorithm. For clarity, Figure 3 shows the steps of COIN applied to the example in Figure 1. The algorithm takes as input the set of all identical concepts that capture all the cliques and bridges. The COIN algorithm then goes through two stages. At the first stage, as shown in Figure 3-2, it uses an efficient Low-discrepancy sampling (LDS) method in [24] to approximate the stability of each identical concept (line 3). Then, it distinguishes two types of identical concepts based on the estimated stability value. The first type is an identical concept that has its highest stability value. From Proposition 2, such identical concept represents an isolated community, and we therefore detect it as a community and move it into the set of final communities (lines 4-7). The second type is an identical concept that contains only two objects and has a stability value of (line 8). According to Proposition 3, this identical concept represents a noisy non-trivial bridge between two potential communities. Thus, we cut this bridge by removing it from the set (line 9). At this stage, contains only the subset of identical concepts which have a stability higher than . Such concepts capture the relevant cliques.

At the second stage, the algorithm iteratively applies a pairwise percolation of every two neighboring relevant cliques if they share at least common objects (lines 12-19), where is the smallest number of objects in the extents of and . Finally, the algorithm moves the components, obtained after the completion of percolation, into (line 20), and it then returns the final set of detected communities as shown in Figure 3-3.

Assume now that the network contains an additional component that represents the following star graph:

Then, three -cliques (equivalent to identical concepts) are identified: and with a stability equal to . Since they do not represent noisy non-trivial bridges, they will be merged at the second step of the algorithm to appear as a (star) community.

Complexity analysis. The first-stage of COIN has a time complexity of , while the second-stage has time complexity, where is a set of the identical concepts and is the time needed to approximate the stability index of an identical concept. Thus, the total time complexity of COIN algorithm is .

4 experimental Evaluation

The main goal of our experimental evaluation is to investigate the following key questions:

  • (Q1) Is COIN more accurate than the state-of-the-art community detection algorithms?

  • (Q2) Is COIN scalable compared other prominent algorithms for detecting communities in social networks?

4.1 Methodology

We started our experiments by first selecting four datasets of real-life social networks:

  1. Zachary’s karate club (Karate) [25] which describes members of the karate club, showing pairwise connection between members who interacted outside the club. A conflict arose between the administrator (member No. 0) and the instructor (member No. 33), which led to the scission of the club into two non-overlapped communities around the two members and ;

  2. American College football (Football) [26] which is a network that represents the schedule of games between college football teams in a single season

  3. Bottlenose Dolphin (Dolphins) [27] which describes a network of frequent associations between dolphins in a community living of Doubtful Sound in New Zealand

  4. Books about US politics (PolBooks) [28] is a network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com.

Table 2 briefly summarizes these datasets which are publicly available at the following URL111 http://www-personal.umich.edu/~mejn/netdata/..

Name Description of group nature
Karate 34 78 2 Membership after the division
Football 115 615 12 Team scheduling
Dolphins 62 159 2 Group of male/female dolphins
PolBooks 105 441 3 Group of books about US politics
Table 2: A brief description of the tested social networks. is the number of object nodes, is the number of edges, and is the number of ground truth communities.

To get answers to the proposed questions, we empirically evaluate our proposed COIN algorithm by comparing its results with the following state-of-the art community detection algorithms:

  1. Louvain [29]

    which is an algorithm that uses a heuristic that maximizes the modularity

  2. Clique percolation method CPM [7, 8, 9] which builds up the communities by percolating adjacent k-cliques, where two -cliques are considered adjacent if they share object nodes

  3. Girvan-Newman GN (also called Edge-betweenness) [4, 26] which is a hierarchical decomposition algorithm that progressively removes edges based on the decreasing order of their betweenness scores

  4. WalkTrap [30] which performs random walks to get the walks that are more likely to stay within the same communities while only a few edges are outside a given community

  5. Fast greedy modularity FGM [31] which is the fastest greedy community analysis algorithm that detects the set of communities based on the optimization of the modularity score

  6. InfoMAP [32, 33]: performs random walks to analyze the information flow through a network, and detects the community set that minimizes the expected description length of a random walker trajectory.

We then assess the accuracy and scalability of results by recording the average elapsed time (), and then calculate the following Normalized mutual information (NMI) metric [34]:

(8)

Where is the number of object nodes.

is a confusion matrix where the rows correspond to

ground-truth communities and the columns correspond to predicted communities found by a given community detection algorithm. Each element is the number of object nodes in the i-th ground-truth community that appear in the j-th predicted community. is the sum over row of , and is the sum over column 222NMI metric . That is, if the predicted communities are identical to the ground-truth ones , then NMI metric is equal to . On the contrary, if they are totally independent, then NMI metric is ..

Furthermore, to guarantee a fair comparison, we conducted our empirical study with the chosen algorithms under the following setting:

  • Re-run all tested algorithms times, and report the average result

  • Vary the parameter in the k-cliques percolation process of CPM, and report the average result

  • Use WalkTrap to serve as a good baseline for assessing the accuracy and scalability

  • Tune the resolution parameter of Louvain and FGM algorithms in the range , and record the best results. Traditionally, is assigned a value by default. But one can set to a value greater than to find a larger number of smaller communities, and assign it a value less than to detect a smaller number of larger communities.

All the experiments were run on an Intel(R) Core(TM) i7-2600 CPU @ 3.20GHz computer with 16GB of memory under Windows . We implemented our COIN algorithm as an extension to the FCA Concepts 0.7.11 package that is implemented by Sebastian Bank.333Publicly available at: https://pypi.python.org/pypi/concepts.

4.2 Results

Methods Social networks
Karate Football Dolphins PolBooks
Louvain
CPM
COIN 0.837 0.978 1.00 0.885
GN
WalkTrap
FGM (3)
InfoMAP
Table 3: NMI-score of the considered community detection algorithms on the tested social networks. The value between (.) is the number of predicted communities.

In terms of accuracy, the results in Table 3 illustrate that COIN is the most accurate of all the compared algorithms, achieving the best NMI score on the four tested social networks. On the Karate dataset, it produces two communities that are approximately correct compared to the two ground-truth ones, and in which only the object node is wrongly detected in the ‘red’ community instead of the ‘yellow’ one as shown in Figure 5. For the Dolphins dataset, COIN detects the two communities that are identical to the ground truth ones as shown in Figure 6. On the Football dataset, COIN predicts the communities that are approximately correct, where only two object nodes (‘nebraska’) and (‘memphis’) are wrongly detected in ‘lime’ and ‘magenta’ communities instead of ‘green’ and ‘orange’ true communities respectively (see Figure 7). It also obtains the three communities of Polbooks dataset with accuracy, where as shown in Figure 8, the object nodes ‘allies’, ‘the bushes’ and ‘bush at war’ predicted to be in ‘blue’ and ‘red’ communities instead of the ‘yellow’ true one.

InfoMAP and CPM come close behind COIN on Fooball, but considerably further behind on both Dolphins and Polbooks. On both Karate and PolBooks, WalkTrap was marginally less accurate than Louvain, but more accurate than CPM. The latter outperforms both FGM and Louvain on the Football dataset. Remarkably, CPM has poor results on both the Karate and Dolphins datasets, and both InfoMAP and Louvain clearly outperform FGM on the Football dataset.

Figure 4: The average elapsed time of the underlying community detection algorithms on the tested social networks.

In terms of computational time, the results in Figure 4 show that COIN is relatively faster than FGM, which is the quickest community detection algorithm among all other tested ones: COIN needs only an average time of , , and seconds to detect communities on the four datasets respectively, while FGM takes , , , and seconds. InfoMAP comes close behind FGM on all datasets, and both of them are considerably fast compared to the baseline WalkTrap on all underlying social networks. Aside from FGM algorithm, COIN clearly prevails over both CPM, Louvain and WalkTrap on all datasets by a significant margin, while saving time by more than twice compared to GN on Karate and Dolphin datasets. In practice, this is due to the fact that COIN mainly detects communities based on the set of identical-concepts , which is frequently small compared to the sets of objects and edges that are used by all other tested community detection algorithms.

Apart from COIN, FGM and InfoMAP, none of the tested algorithms — the CPM, WalkTrap, and Louvain — shows its clear superiority over the others. Remarkably, Louvain is very competitive with WalkTrap, and CPM slightly competes with GN algorithm on all datasets.

Overall, the results in Figures 4-8 and Table 3, clearly show that COIN can detect communities more accurately than existing prominent algorithms and even with less computational time.

Figure 5: The predicted communities of Karate network obtained by COIN algorithm.
Figure 6: The predicted communities of Dolphin network obtained by COIN algorithm.
Figure 7: The predicted communities of Football network obtained by COIN algorithm.
Figure 8: The predicted communities of PolBooks network obtained by COIN algorithm.

5 Conclusion

We have proposed COIN, a two-stage method that exploits Formal Concept Analysis and concept stability to efficiently detect communities in one-mode social networks. All the identical concepts that capture cliques and bridges are first extracted from the concept lattice. Then, the stability index of these concepts is used to identify relevant cliques and cut irrelevant bridges between communities. Finally, cliques with particular features are merged to obtain the final communities. Our method has been tested on different networks with different shapes of components (e.g., star, fully or strongly connected components, chains) and led to accurate results. The empirical study on real-life social networks (see Section 4) shows that COIN can detect communities in a more accurate and efficient manner than other state-of-the-art methods.

In the future we plan to generalize COIN to detect overlapping communities both in one-mode, two-mode data and even multi-layer networks. We also intend to propose an online (incremental) variant of COIN algorithm to tackle dynamic social networks and capture the evolution of their communities over time. Finally, we intend to increase the accuracy of COIN by exploiting other concept interestingness measures (e.g. separation index and concept probability [19]) and intensively investigate the efficiency of COIN on big and dense social data networks.

References

  • [1] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 – 174, 2010.
  • [2] Santo Fortunato and Darko Hric. Community detection in networks: A user guide. Physics Reports, 659:1 – 44, 2016. Community detection in networks: A user guide.
  • [3] Michelle Girvan and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the national academy of sciences, 99(12):7821–7826, 2002.
  • [4] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.
  • [5] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., 1999. Translator-C. Franzke.
  • [6] Linton C. Freeman. Cliques, galois lattices, and the structure of human social groups. Social Networks, 18(3):173 – 187, 1996.
  • [7] Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814, 2005.
  • [8] Tim S Evans. Clique graphs and overlapping communities. Journal of Statistical Mechanics: Theory and Experiment, 2010(12):P12037, 2010.
  • [9] Balázs Adamcsek, Gergely Palla, Illés J Farkas, Imre Derényi, and Tamás Vicsek. Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22(8):1021–1023, 2006.
  • [10] Fei Hao, Geyong Min, Zheng Pei, Doo-Soon Park, and Laurence T Yang. -clique community detection in social networks based on formal concept analysis. IEEE Systems Journal, 11(1):250–259, 2017.
  • [11] Lhouari Nourine and Olivier Raynaud. A fast algorithm for building lattices. Information processing letters, 71(5-6):199–204, 1999.
  • [12] Christian Lindig. Fast concept analysis. Working with Conceptual Structures-Contributions to ICCS, 2000:152–161, 2000.
  • [13] Petko Valtchev, Rokia Missaoui, and Pierre Lebrun. A partition-based approach towards constructing galois (concept) lattices. Discrete Mathematics, 256(3):801–829, 2002.
  • [14] Vicky Choi. Faster algorithms for constructing a concept (galois) lattice. In Clustering Challenges In Biological Networks, pages 169–186. World Scientific, 2009.
  • [15] Aleksey Buzmakov, Sergei O Kuznetsov, and Amedeo Napoli. Is concept stability a measure for pattern selection? Procedia Computer Science, 31:918–927, 2014.
  • [16] Sergei O Kuznetsov. On stability of a formal concept.

    Annals of Mathematics and Artificial Intelligence

    , 49(1):101–115, 2007.
  • [17] Camille Roth, Sergei Obiedkov, and Derrick G Kourie. On succinct representation of knowledge community taxonomies with formal concept analysis. International Journal of Foundations of Computer Science, 19(02):383–404, 2008.
  • [18] Mikhail Klimushkin, Sergei A Obiedkov, and Camille Roth. Approaches to the selection of relevant concepts in the case of noisy data. In ICFCA, volume 20, pages 255–266. Springer, 2010.
  • [19] Sergei O. Kuznetsov and Tatyana P. Makhalova. Concept interestingness measures: a comparative study. In Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications, Clermont-Ferrand, France, October 13-16, 2015., pages 59–72. CLA, 2015.
  • [20] Mikhail A. Babin and Sergei O. Kuznetsov. Approximating concept stability. In Formal Concept Analysis - 10th International Conference, ICFCA 2012, Leuven, Belgium, May 7-10, 2012. Proceedings, pages 7–15. Springer, 2012.
  • [21] Hui-lai Zhi. On the calculation of formal concept stability. Journal of Applied Mathematics, 2014:1–6, 2014.
  • [22] Jirapond Muangprathub. A novel algorithm for building concept lattice. Applied Mathematical Sciences, 8(11):507–515, 2014.
  • [23] Sergei O Kuznetsov. Learning of simple conceptual graphs from positive and negative examples. In PKDD, volume 99, pages 384–391. Springer, 1999.
  • [24] Mohamed-Hamza Ibrahim and Rokia Missaoui. An efficient approximation of concept stability using low-discrepancy sampling. In Graph-Based Representation and Reasoning - 23rd International Conference on Conceptual Structures, ICCS 2018, Edinburgh, UK, June 20-22, 2018, Proceedings, pages 24–38. Springer, 2018.
  • [25] Wayne W Zachary. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452–473, 1977.
  • [26] Michelle Girvan and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the national academy of sciences, 99(12):7821–7826, 2002.
  • [27] David Lusseau, Karsten Schneider, Oliver J Boisseau, Patti Haase, Elisabeth Slooten, and Steve M Dawson. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4):396–405, 2003.
  • [28] Valdis Krebs. A network of books about recent us politics sold by the online bookseller amazon.com. Unpublished http://www. orgnet. com, 2008.
  • [29] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
  • [30] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In International symposium on computer and information sciences, pages 284–293. Springer, 2005.
  • [31] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.
  • [32] Martin Rosvall and Carl T Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008.
  • [33] Martin Rosvall, Daniel Axelsson, and Carl T Bergstrom. The map equation. The European Physical Journal Special Topics, 178(1):13–23, 2009.
  • [34] Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09):P09008, 2005.