As a key problem in graph theory and social network analysis, the mining of cohesive subgraphs, such as -core, -truss, clique, etc, has found many important applications in real life [Cohen2008, Tsourakakis et al.2013, Wen et al.2016, Yu et al.2013]. The mined cohesive subgraph can serve as an important metric to evaluate the properties of a network, such as network engagement. In this paper, we use the -truss model to measure the cohesiveness of a social network. Unlike -core, -truss not only emphasizes the users’ engaged activities (i.e., number of friends), but also requires strong connections among users. That is, the -truss of is the maximal subgraph where each edge is involved in at least triangles. Note that triangle is an important building block for the analysis of social network structure [Xiao et al.2017, Cui et al.2018]. Thus the number of edges in the -truss can be utilized to measure the stability of network structure.
The breakdown of a strong connection may affect other relationships, which can make certain relationships involved in less than triangles and removed from the -truss. Hence, it will lead to a cascading breakdown of relationships eventually. To identify the critical edges, in this paper, we investigate the -truss minimization problem. Given a social network and a budget , -truss minimization aims to find a set of edges, which will result in the largest number of edge breaks in the -truss by deleting .
Figure 1 is a toy social network with 10 users. Suppose is 4. Then only the blue and red edges belong to the -truss. If we delete edge , it will affect the connections among other users and lead to the removal of all the blue edges from the -truss, since they no longer meet the requirement of -truss. We can see that the deletion of one single edge can seriously collapse the social network. The -truss minimization problem can find many applications in real life. For instance, given a social network, we can reinforce the community by paying more attention to the critical relationships. Also, we can strengthen the important connections to enhance the stability of a communication network or detect vital connections in enemy’s network for military purpose.
The main challenges of this problem lie in the following two aspects. Firstly, we prove that the problem is NP-hard. It means that it is non-trivial to obtain the result in polynomial time. Secondly, the number of edges in a social network is usually quite large. Even if we only need to consider the edges in -truss as candidates, it is still a large amount of edges to explore. To the best of our knowledge, we are the first to investigate the -truss minimization problem through edge deletion. We formally define the problem and prove its hardness. Novel pruning rules are developed to reduce the searching space. To further speed up the computation, an upper bound based strategy is proposed.
2.1 Problem Definition
We consider a social network as an undirected graph. Given a subgraph , we use (resp. ) to denote the set of nodes (resp. edges) in . is the neighbors of in . equals , denoting the degree of in . is the number of edges in . Assuming the length of each edge equals 1, a triangle is a cycle of length 3 in the graph. For , a containing-e-triangle is a triangle which contains .
Definition 1 (-core).
Given a graph , a subgraph is the k-core of , denoted as , if (i) satisfies degree constraint, i.e., for every ; and (ii) is maximal, i.e., any supergraph of cannot be a k-core.
Definition 2 (edge support).
Given a subgraph and an edge , the edge support of is the number of containing-e-triangles in , denoted as .
Definition 3 (-truss).
Given a graph , a subgraph is the -truss of , denoted by , if (i) for every edge ; (ii) is maximal, i.e., any supergraph of cannot be a -truss; and (iii) is non-trivial, i.e., no isolated node in .
Definition 4 (trussness).
The trussness of an edge , denoted as , is the largest integer that satisfies and .
Based on the definitions of -core and -truss, we can see that -truss not only requires sufficient number of neighbors, but also has strict constraint over the strength of edges. A -truss is at least a (-1)-core. Therefore, to compute the -truss, we can first compute the (-1)-core and then find the -truss over (-1)-core by iteratively removing all the edges that violate the -truss constraint. The time complexity is [Wang and Cheng2012]. Given a set of edges in , we use to denote the -truss after deleting . We use to denote the number of edges in . We define the followers of as the edges that are removed from due to the deletion of . Then our problem can be formally defined as follows.
Given a graph and a budget , the -truss minimization problem aims to find a set of edges, such that the is minimized. It is also equivalent to finding an edge set that can maximize , i.e.,
For , the -truss minimization problem is NP-hard.
For , we sketch the proof for . A similar construction can be applied for the case of . When , we reduce the -truss minimization problem from the maximum coverage problem [Karp1972], which aims to find sets to cover the largest number of elements, where is a given budget. We consider an instance of maximum coverage problem with sets , , .., and elements = . We assume that the maximum number of elements inside is . Then we construct a corresponding instance of the -truss minimization problem in a graph as follows. Figure 2(a) is a constructed example for .
We divide into three parts, , and . 1) consists of parts. Each part corresponds to in the maximum coverage problem instance. 2) consists of parts. Each part corresponds to in the maximum coverage problem instance. 3) is a dense subgraph. The support of edges in is no less than . Specifically, suppose consists of elements, consists of nodes and edges. To construct , we first construct a -polygon. Then, we add a node in the center of -polygon and add edges between and . Finally, we further add nodes and edges . With the construction, the edges in have support no larger than 3. We use to provide support for edges in and make the support of edges in to be 3. Each part in consists of nodes and the structure is a list of triangles which is shown in Figure 2(a). For each element in , we add two triangles between and to make them triangle connected. The structure is shown in 2(b). Note that each edge in and can be used at most once. We can see that edges in have support no larger than 3. Finally, we use to provide support for edges in and make the support of edges in to be 3. Then the construction is completed. The construction of for is shown in Figure 2(c).
With the construction, we can guarantee that 1) deleting any edge in can make all the edges in and the edges in who have connections with deleted from the truss. 2) Only the edges in can be considered as candidates. 3) Except the followers in , each has the same number of followers. In Figure 2(a), deletion of each can make edges (except the edges in ) removed. Consequently, the optimal solution of -truss minimization problem is the same as the maximum coverage problem. Since the maximum coverage problem is NP-hard, the theorem holds. ∎
The objective function is monotonic but not submodular.
Suppose . For every edge in , will be deleted from the -truss when deleting . Thus and is monotonic. Given two sets and , if is submodular, it must hold that . We show that the inequality does not hold by constructing a counter example. In Figure 1, for , suppose and . We have , , and . The inequation does not hold. is not submodular. ∎
2.2 Baseline Algorithm
For the k-truss minimization problem, a naive solution is to enumerate all the possible edge sets of size , and return the best one. However, the size of a real-world social network is usually very large. The number of combinations is enormous to enumerate. Due to the complexity and non-submodular property of the problem, we resort to the greedy framework. Algorithm 1 shows the baseline greedy algorithm. It is easy to verify that we only need to consider the edges in the -truss as candidates. The algorithm iteratively finds the edge with the largest number of followers in the current -truss (Line 3). The algorithm terminates when edges are found. The time complexity of the baseline algorithm is .
3 Group Based Solution
In this section, novel pruning techniques are developed to accelerate the search in baseline algorithm.
3.1 Candidate Reduction
Before introducing the pruning rules, we first present some definitions involved.
Definition 5 (triangle adjacency).
Given two triangles in , they are triangle adjacent if and share a common edge, which means .
Definition 6 (triangle connectivity).
Given two triangles in , they are triangle connected, denoted as , if there exists a sequence of triangles in , such that , and for , and are triangle adjacent.
For two edges and , we say they are triangle adjacent, if and belong to the same triangle. As shown in the baseline algorithm, we only need to consider the edges in as candidates. Lemma 1 shows that we only need to explore the edges in .
Given a -truss , let . If an edge has at least one follower, must be in , where where and are triangle adjacent.
We prove the lemma by showing that edges in do not have followers. We divide into two sets. 1) For edge with trussness less than , it will be deleted during the -truss computation. 2) For an edge in , if is not triangle adjacent with any edge in , it means is triangle adjacent with edges such as whose . If we delete , all the edges triangle adjacent with will still have support at least in . Thus, has no follower. The lemma is correct. ∎
Based on Lemma 2, we can skip the edges that are the followers of the explored ones.
Given two edges , if , then we have .
, it implies that will be deleted during the deletion of . Therefore, each edge in will be deleted when is deleted. Consequently, we have . ∎
To further reduce the searching space, we introduce a pruning rule based on -support group.
Definition 7 (-support group).
Given a k-truss , a subgraph is a -support group if it satisfies : 1) , . 2) , suppose , . There exists a sequence of triangles with , . For , and . 3) is maximal, i.e., any supergraph of cannot be a -support group.
Lemma 3 shows that edges in the same -support group are equivalent. The deletion of any edge in a -support group can lead to the deletion of the whole -support group.
is a -support group of . For , if we delete , we can have deleted from .
Since is a -support group of , for , suppose that , there exists a sequence of triangles with . For , and . The deletion of any edge inside the group will destroy the corresponding triangles and decrease the support of triangle adjacent edges by 1. It will lead to a cascading deletion of subsequent triangle edges in the group due to the violation of truss constraint. Therefore, the lemma holds. ∎
According to Lemma 3, we only need to add one edge from a -support group to the candidate set, and the other edges in the group can be treated as the followers of the selected edge. In the following lemma, we can further prune the edges that are adjacent with multiple edges in a -support group.
Suppose that and . For a -support group , if belongs to more than triangles, each of which contains at least one edge in , then is a follower of .
According to Lemma 3, by removing an edge from , we have deleted from . Since belongs to more than triangles, each of which contains at least one edge in , the support of will decrease by more than due to the deletion of . So its support will be less than and it will be deleted due to the support constraint. Thus, is a follower of . ∎
3.2 Group Based Algorithm
We improve the baseline algorithm by integrating all the pruning rules above, and the details are shown in Algorithm 2. In each iteration, we first find -support groups of current and compute the candidate set according to Lemma 3 (Line 4). This process, i.e., FindGroup function, corresponds to Line 12-19. It can be done by conducting BFS search from edges in . We use a hash table to maintain the group id (i.e., gID) for each edge and the gID starts from 0 (Line 13). For each unvisited edge with support of , we conduct a BFS search from it by calling function GroupExpansion (Line 20-32). During the BFS search, we visit the edges that are triangle adjacent with the current edge, and push the edges with support of into the queue if they are not visited (Line 25 and 28). The edges, which are visited in the same BFS round, are marked with the current gID. For the visited edges with support larger than , we use a hash table to record its coverage with the current -support group, and update the candidate set based on Lemma 4 (Line 31). According to Lemma 2, we can further update the candidate set after computing the followers of edges (Line 7).
4 Upper Bound Based Solution
The group based algorithm reduces the size of candidate set by excluding the edges in the same -support group and the followers of -support groups, which greatly accelerates the baseline method. However, for each candidate edge, we still need lots of computation to find its followers. Given an edge, if we can obtain the upper bound of its follower size, then we can speed up the search by pruning unpromising candidates. In this section, we present a novel method to efficiently calculate the upper bound required.
4.1 Upper Bound Derivation
Before introducing the lemma, we first present some basic definitions. Recall that denotes the trussness of .
Definition 8 (-triangle).
A triangle is a k-triangle, if the trussness of each edge is no less than .
Definition 9 (-triangle connectivity).
Two triangles and are -triangle connected, denoted as , if there exists a sequence of triangles with . For , and .
We say two edges are -triangle connected, denoted as , if and only if 1) and belong to the same -triangle, or 2) , with .
Definition 10 (-truss group).
Given a graph and an integer , a subgraph is a -truss group if it satisfies: 1) . 2) . 3) is maximal, i.e., there is no supergraph of satisfying conditions 1 and 2.
Based on the definition of -truss group, Lemma 5 gives an upper bound of .
If is triangle adjacent with k-truss groups , , we have .
Suppose , we have , so is contained by triangles and is triangle adjacent with edges. We divide the edges which are triangle adjacent with in into two parts. 1) . Since the deletion of may cause to decrease at most 1 [Huang et al.2014, Akbas and Zhao2017], we have after deleting , which means has no contribution to . 2) . Suppose . The deletion of can cause trussness of each edge in to decrease at most 1. Then can contribute to with at most . Thus, is an upper bound of . ∎
4.2 Upper Bound Based Algorithm
Based on Lemma 5, we can skip the edges whose upper bound of follower size is less than the best edge in the current iteration. However, given the trussness of each edge, it may still be prohibitive to find the -truss group that contains an edge , since in the worst case we need to explore all the triangles in the graph. To compute the upper bound efficiently, we construct an index to maintain the relationships between edges and their -truss groups.
To find the -truss group for a given edge , we extend the GroupExpansion function in Line 20-32 of Algorithm 2. It also follows the BFS search manner. The difference is that when we explore an adjacent triangle, it must satisfy the -triangle constraint, and we only enqueue an edge, whose trussness satisfies -triangle connectivity constraint. After finishing the BFS search starting from , its involved -truss groups can be found.
After deleting an edge in the current iteration, the constructed -truss groups may be changed. Therefore, we need to update the -truss groups for the next iteration. The update algorithm consists of two parts, i.e., update the trussness and update the groups affected by the changed trussness. To update the edge trussness, we apply the algorithm in [Huang et al.2014], which can efficiently update the edge trussness after deleting an edge . Given the edges with changed trussness, we first find the subgraph induced by these edges. Then we reconstruct the -truss groups for the induced subgraph and update the original ones. Based on the -truss groups constructed, we can compute the upper bound of followers for edges efficiently. The final algorithm, named UP-Edge, integrates all the techniques proposed in Section 3 and 4.
5.1 Experiment Setting
In the experiments, we implement and evaluate the following algorithms. 1) Exact: naive algorithm that enumerates all the combinations. 2) Support: in each iteration, it selects the edge that is triangle adjacent with the edge with minimum support in the -truss. 3) Baseline: baseline algorithm in Section 2.2. 4) GP-Edge: group based algorithm in Section 3. 5) UP-Edge: upper bound based algorithm in Section 4.
We employ 9 real social networks (i.e., Bitcoin-alpha, Email-Eu-core, Facebook, Brightkite, Gowalla, DBLP, Youtube, Orkut, LiveJournal) to evaluate the performance of the proposed methods. The datasets are public available111https://snap.stanford.edu/data/, https://dblp.org/xml/release/. Since the Exact algorithm is too slow, we only run Exact algorithm on Email-Eu-core and Bitcoin-alpha dataset.
Since the properties of datasets are quite different, we set the default as 10 for 4 datasets (Gowalla, Youtube, Brightkite, DBLP) and set the default as 20 for 3 datasets (Facebook, LiveJournal, Orkut). We set default as 5 for all datasets. All the programs are implemented in C++. All the experiments are performed on a machine with an Intel Xeon 2.20 GHz CPU and 128 GB memory running Linux.
5.2 Effectiveness Evaluation
To evaluate the effectiveness of the proposed methods, we report the number of followers by deleting edges. Since UP-Edge only accelerates the speed of Baseline and GP-Edge, we only report the results of UP-Edge here. Due to the huge time cost of Exact, we show the result on 3 datasets, that is, Bitcoin-alpha, Email-Eu-core and Artificial network (generated by GTGraph with 500 nodes and 5000 edges).
We set and for Bitcoin-alpha and Artificial network respectively, and vary from 1 to 4. In Figure 3(a), we can see that there is only a slight drop when b=3. In Figure 3(b), there is only a small drop when b=4. In Figure 3(c), as we can see, UP-Edge also shows comparable results with Exact and they all outperform Support significantly. Similar results can be observed in Figure 3(d)-3(f) over all the datasets and the selected datasets. Figure 3(e) and 3(f) show the results on LiveJournal by varying and . As observed, the number of followers for the two algorithms are positive correlated with , and has a great impact on follower size.
Figure 4 shows a case study on DBLP with . We can see that the edge between Lynn A. Volk and David W. Bates is the most pivotal relationship. This edge has 264 followers (grey edges in the figure). It is interesting that most followers have no direct connection with them.
5.3 Efficiency Evaluation
To evaluate the efficiency, we compare the response time of UP-Edge and GP-Edge with Baseline . We first conduct the experiments on all the datasets with default settings. Figure 5 shows the response time of the three algorithms. We can see that UP-Edge and GP-Edge significantly outperform Baseline in all the datasets because of the pruning techniques developed. UP-Edge is faster than GP-Edge due to the contribution of upper bound derived. Figure 6 shows the results conducted on LiveJournal by varying and . We can see that when grows, the response time increases since more edges need to be selected. When grows, the response time decreases since the searching space becomes smaller.
6 Related Work
Graph processing has been a hot topic in many areas recently, which usually requires much more computation comparing with some traditional queries [Luo et al.2008, Wang et al.2010, Wang et al.2015]. Cohesive subgraph identification is of great importance to social network analysis. In the literature, different definitions of cohesive subgraphs are proposed, such as -core [Seidman1983, Wen et al.2016], -truss [Huang and Lakshmanan2017], clique [Tsourakakis et al.2013], dense neighborhood graph [Kreutzer et al.2018], etc. In the literature, numerous research is conducted to investigate the -truss decomposition problem under different settings, including in-memory algorithms [Cohen2008], external-memory algorithms [Wang and Cheng2012], distributed algorithms [Chen et al.2014], etc. In some studies, authors leverage the -truss property to mine required communities [Huang et al.2014, Huang and Lakshmanan2017]. Huang et al. [Huang et al.2016] investigate the truss decomposition problem in uncertain graphs. Recently, some research focuses on modifying the graph to maximize/minimize the corresponding metric [Bhawalkar et al.2015, Zhang et al.2017, Zhu et al.2018, Medya et al.2018]. Bhawalkar et al. [Bhawalkar et al.2015] propose the anchored -core problem, which tries to maximize the -core by anchoring nodes, while Zhang et al. [Zhang et al.2017] and Zhu et al. [Zhu et al.2018] investigate the problem of -core minimization by deleting nodes and edges, respectively. In [Medya et al.2018], Medya et al. try to maximize the node centrality by adding edges to the graph. However, these techniques cannot be extended for our problem.
In this paper, we study the k-truss minimization problem. We first formally define the problem. Due to the hardness of the problem, a greedy baseline algorithm is proposed. To speed up the search, different pruning techniques are developed. In addition, an upper bound based strategy is presented by leveraging the -truss group concept. Lastly, we conduct extensive experiments on real-world social networks to demonstrate the advantage of the proposed techniques.
- [Akbas and Zhao2017] Esra Akbas and Peixiang Zhao. Truss-based community search: a truss-equivalence based indexing approach. PVLDB, 10(11):1298–1309, 2017.
- [Bhawalkar et al.2015] Kshipra Bhawalkar, Jon Kleinberg, Kevin Lewi, Tim Roughgarden, and Aneesh Sharma. Preventing unraveling in social networks: the anchored k-core problem. SIAM Journal on Discrete Mathematics, 29(3):1452–1475, 2015.
- [Chen et al.2014] Pei-Ling Chen, Chung-Kuang Chou, and Ming-Syan Chen. Distributed algorithms for k-truss decomposition. In IEEE International Conference on Big Data, 2014.
- [Cohen2008] Jonathan Cohen. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report, 2008.
- [Cui et al.2018] Yi Cui, Di Xiao, and Dmitri Loguinov. On efficient external-memory triangle listing. TKDE, 2018.
- [Huang and Lakshmanan2017] Xin Huang and Laks V. S. Lakshmanan. Attribute-driven community search. PVLDB, 2017.
- [Huang et al.2014] Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, pages 1311–1322, 2014.
- [Huang et al.2016] Xin Huang, Wei Lu, and Laks V.S. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. In SIGMOD, 2016.
- [Karp1972] Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, pages 85–103, 1972.
- [Kreutzer et al.2018] Stephan Kreutzer, Roman Rabinovich, and Sebastian Siebertz. Polynomial kernels and wideness properties of nowhere dense graph classes. ACM Trans. Algorithms, 15(2), 2018.
- [Luo et al.2008] Yi Luo, Wei Wang, and Xuemin Lin. Spark: A keyword search engine on relational databases. In ICDE, 2008.
- [Medya et al.2018] Sourav Medya, Arlei Silva, Ambuj Singh, Prithwish Basu, and Ananthram Swami. Group centrality maximization via network design. In ICDM, 2018.
- [Seidman1983] Stephen B. Seidman. Network structure and minimum degree. Social Networks, 5(3):269–287, 1983.
- [Tsourakakis et al.2013] Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In KDD, 2013.
- [Wang and Cheng2012] Jia Wang and James Cheng. Truss decomposition in massive networks. Proc. VLDB Endow., 2012.
- [Wang et al.2010] Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, Haixun Wang, Hongsong Li, Wanpeng Tian, Jun Xu, and Rui Li. Mapdupreducer: detecting near duplicates over massive datasets. In SIGMOD, 2010.
- [Wang et al.2015] Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Wei Wang. Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE, 2015.
- [Wen et al.2016] Dong Wen, Lu Qin, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. I/O efficient core graph decomposition at web scale. In ICDE, 2016.
- [Xiao et al.2017] Di Xiao, Yi Cui, Daren BH Cline, and Dmitri Loguinov. On asymptotic cost of triangle listing in random graphs. In PODS, 2017.
- [Yu et al.2013] Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1), 2013.
- [Zhang et al.2017] Fan Zhang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. Finding critical users for social network engagement: The collapsed k-core problem. In AAAI, 2017.
- [Zhu et al.2018] Weijie Zhu, Chen Chen, Xiaoyang Wang, and Xuemin Lin. K-core minimization: An edge manipulation approach. In CIKM, 2018.