1 Introduction
With the proliferation of graph applications, research efforts have been devoted to many fundamental problems in analyzing graph data [38, 55, 53, 16, 29, 49, 40, 37]. Clique is one of the most fundamental cohesive subgraph models in graph analysis, which requires each pair of vertices has an edge. Due to the completeness requirement, clique model owns many interesting cohesiveness properties, such as the distance of any two vertices in a clique is one, every one vertex in a clique forms a dominate set of the clique and the diameter of a clique is one [39]. As a result, clique model has wide application scenarios in social network mining, financial analysis and computational biology and has been extensively investigated for decades. Existing studies on clique mainly focus on the unsigned networks, i.e., all the edges in the graph share the same property [4, 13, 14, 51]. Unfortunately, relationships between two entities in many realworld applications have completely opposite properties, such as friendfoe relationships between users in social networks [12, 23], supportdissent opinions in opinion networks [25], trustdistrust relationships in trust networks [26] and partnershipantagonism in proteinprotein interaction networks [36]. Modelling these applications as signed networks with positive and negative edges allows them to capture more sophisticated semantics than unsigned networks [5, 34, 1, 26, 33, 11]. Consequently, existing studies on clique ignoring the sign associated with each edge may be inappropriate to characterize the cohesive subgraphs in a signed network and there is an urgent need to define an exclusive clique model tailored for the signed networks.
For the signed networks, the most fundamental and dominant theory revealing the dynamics and construction of the signed networks is the structural balance theory [19, 18, 5, 34, 1, 12, 26, 33, 11]. The intuition underlying the structural balance theory can be described as the aphorisms: “The friend (resp. enemy) of my friend (resp. enemy) is my friend, the friend (resp. enemy) of my enemy (resp. friend) is my enemy”. Specifically, a signed network is structural balanced if can be split into two subgraphs such that the edges in the same subgraph are positive and the edges between subgraphs are negative [18]. In a signed network, an imbalanced substructure is unstable and tends to evolve into a balanced state. Consider the graph shown in Figure 1 (a). The negative edge between and makes imbalanced. and have a mutual “friend” and mutual “enemies” , and . It means and share more common grounds than differences. According to structural balance theory, and tend to be allies as time goes by. shown in Figure 1 (b) is the evolved balanced counterpart of . In , the sign of the edge between and becomes positive. and form two alliances and the edges in the same alliance are positive and the edges connecting different alliances are negative. As illustrated in this example, structural balance reflects the key characteristics of the signed networks.
According to the above analysis, clique model is a fundamental cohesive subgraph model in graph analysis, but there is no appropriate counterpart in the signed networks. Meanwhile, the structure of the signed networks is expected to be balanced based on the structure balance theory. Motivated by this, we propose a maximal balanced clique model in this paper. Formally, given a signed network , a maximal balanced clique is a maximal subgraph of such that (1) is complete, i.e., every pair of vertices in has an edge. (2) is balanced, i.e., can be divided into two parts such that the edges in the same part are positive and the edges connecting two parts are negative. This definition not only catches the essence of the clique model in the unsigned networks but also guarantees that a detected clique is stable in the signed networks. In this paper, we aim to devise efficient algorithms to enumerate all maximal balanced cliques in a given signed network.
Moreover, in real signed networks, the number of maximal balanced cliques could be extremely large. For instance, in ”Douban” network which is a Chinese score service website, there are more than a million balanced cliques in it. However, in some applications, users prefer a unique and representative balanced clique with maximum size rather than all balanced cliques. Maximum clique search problem is a fundamental and hot research topic in graph analysis. In the literature, numerous studies have been conducted, such as maximum clique search [6, 31], maximum quasiclique search [7], maximum biclique search[32], k*partite clique with maximum edges[58], clique with maximum edge/vertex weight on weighted graph[45, 28]. Motivated by this, we aim to devise a maximum balanced clique search algorithm to find out the balanced clique with maximum vertex size, which can scale to largescale real signed networks (with more than 100 million edges).
Applications. Balanced clique computation can be used in many applications, for example:
(1) Opinion leaders detection in opinion networks. Opinion leaders are people who are active in a community capturing the most representative opinions in the social networks [46]. In an opinion network, each vertex represents a user and there is a positive/negative edge between two vertices if one user support/dissent another user. A maximal balanced clique in an opinion network represents a group of users, such that these users actively involve in the opinion networks and have their clear standpoints. Hence, the users in the maximal balanced cliques are good candidates of opinion leaders in the opinion network.
(2) Finding international alliancesrivalries groups. The international relationships between nations can be modeled as a signed network, where each vertex represents a nation, positive and negative edges indicate alliances and rivalries, respectively. Computing the maximal balanced cliques in such networks reveals hostile groups of allied forces[12, 3]. We can extend it to find the alliancesrivalries commercial groups among business organizations similarly, such as {Pepsi, KFC} vs {Coke, McDonald}[21].
(3) Synonym and antonym groups discovery. In a word network, each vertex represents a word and there is a positive edge between two synonyms and a negative edge between two antonyms[35]. In such signed networks, our model can discover synonym groups that are antonymous with each other, such as, {interior, internal, intimate} and {away, foreign, outer, outside, remote}. These discovered groups may be further used in applications such as automatic question generation [24] and semantic expansion [22].
Contributions. In this paper, we make the following contributions:
(1) The first work to study the maximal balanced clique model. We formalize the balanced clique model in signed networks based on the structural balance theory. To the best of our knowledge, this is the first work considering the structural balance of the cliques in signed networks. We also prove the NPHardness of the problem.
(2) A new framework tailored for maximal balanced clique enumeration in signed networks. After investigating the drawbacks of the straightforward approach, we propose a new framework for the maximal balanced clique enumeration. Our new framework enumerates the maximal balanced cliques based on the signed network directly and its memory consumption is linear to the size of the input signed network.
(3) Two effective optimization strategies to further improve the enumeration performance. We explore two optimization strategies, inenumeration optimization and preenumeration optimization, to further improve the enumeration performance. The inenumeration optimization can avoid the exploration for unpromising vertices during the enumeration while the preenumeration techniques can prune unpromising vertices and edges before enumeration.
(4) An efficient maximum balanced clique search algorithm. To address the maximum balanced clique search problem, we first propose a baseline algorithm. In order to reduce the search space during the search process of baseline, we propose a search space partitionbased algorithm  by partitioning the whole search space into multiple search regions. In each search region, two size thresholds and are used to search the result matching the size requirement specific to this search region, such that the search space is limited into a small area. To further improve the efficiency of  algorithm, we also explore three optimization strategies to prune invalid search branches and candidates during the search process.
(5) Extensive performance studies on real datasets. We first evaluate the performance of algorithms by conducting extensive experimental studies on real datasets. As shown in our experiments, the baseline approach only works on small datasets while our approach can complete the enumeration efficiently on both small and large datasets. Then, we evaluate the performance of our proposed algorithm. The baseline algorithm can not get the result within a reasonable time on large datasets, while our optimized algorithm shows high efficiency, effectiveness and scalability.
2 Related Work
Signed network analysis. Structural balance theory is originally introduced in [19] and generalized in the graph formation in [18, 5]. After that, structural balance theory is developed extensively [34, 1, 26, 33, 11]. In these works, it is interesting to mention that the authors in [33] model the evolving procedure of a signed network and theoretically prove that the network would evolve into a balanced clique when the mean value of the initial friendliness among the vertices . [57] provides a comprehensive survey on structural balanced theory.
Besides, a large body of literature on mining signed networks has been emerged. Among them, the most closely related work to ours is [27] in which an clique model is proposed. Compared with our model, clique model only considers the amount of positive and negative edges in the clique and the structural balance of the clique is totally ignored, which makes clique model essentially different from our model. In [17], a balanced trusted clique model is proposed. Although the balanced trusted clique model has a similar name with our model, it ignores the negative edges in the clique, which means the information of the negative edges are totally missed.
Clique on unsigned networks. Clique model is one of the most fundamental cohesive subgraph models. [4] proposes an efficient algorithm for maximal clique enumeration based on backtracking search.[2] first considers the memory consumption during the maximal clique enumeration. Based on [4], more efficient algorithms are investigated [47, 13, 14].[13] proposes a novel branch pruning strategy, which can efficiently reduce the search space by ignoring the search process from the neighbors of the pivot. [15] reviews recently advances in maximal clique enumeration. Based on clique, other cohesive subgraph models are also studied, such as core [44], truss[10, 20], edge connected component[59, 52, 54], and nuclei [41, 42].
3 Problem Statement
In this paper, we consider an undirected and unweighted signed network , where denotes the set of vertices, denotes the positive edges and denotes the negative edges connecting the vertices in . We denote the number of vertices and number of edges by and , respectively. For each vertex , let represents the positive neighbors of , and let represents the negative neighbors of . We use and to denote the positive and negative degree of , respectively. We also use and to denote the neighbors and degree of , i.e., and . For simplicity, we omit G in the above notations if the context is selfevident.
Definition 3.1: (Balanced Network [18]) Given a signed network , it’s balanced iff it can be split into two subgraphs and , s.t. or , and or .
Definition 3.2: (Maximal Balanced Clique) Given a signed network , a maximal balanced clique is a maximal subgraph of that satisfies the following constraints:

Complete: is complete, i.e, .

Balanced: is balanced, i.e, it can be split into two subcliques and , s.t. or , and or .
Definition 3.3: (Maximum Balanced Clique) Given a signed network , a maximum balanced clique in is a balanced clique with the maximum vertex size.
Since many real applications require that the number of vertices in and is not less than a fixed threshold, we add a size constraint on and s.t. and . With the size constraint, users can control the size of the returned maximal balanced cliques based on their specific requirements. We formalize the studied problems in the paper as follows:
Problem Statement. Given a signed network and an integer ,

the maximal balanced clique enumeration () problem aims to compute all the maximal balanced cliques in s.t. and for .

the maximum balanced clique search () problem aims to compute the balanced clique in s.t. , and is maximum.
Example 3.1: Consider the signed network in Figure 2 in which positive/negative edges are denoted by solid/dashed lines. Assume = 2, there are 4 maximal balanced cliques in , namely, , , , , where vertices in and are marked with different colors. Among them, is the maximum balanced clique.
Problem Hardness.The problem is NPHard, which can be proved following the NPHardness of maximal clique enumeration problem [8, 43]. Given an unsigned network , we can transfer to a signed network as follows: we first keep all the vertices of in and all the edges of as positive edges in ; then, we add a new vertex to and connect to all vertices in with negative edges. It’s clear that each maximal clique in corresponds a maximal balanced clique in (assume ), which means the maximal clique enumeration problem in can be reduced to the problem in . As the maximal clique enumeration problem is NPHard [8, 43], our problem is also NPHard.
4 A Baseline Algorithm for MBCE Problem
We first propose a baseline algorithm to address problem based on existing methods for maximal clique enumeration [14] and maximal biclique enumeration [56] in unsigned networks. For a signed network , we can treat it as the combination of two unsigned networks and . For any maximal balanced clique in , it is clear that (resp. ) is a clique in and the subgraph induced by vertices in and in is a biclique. Therefore, we can enumerate the maximal balanced cliques in in two steps: 1) compute all the maximal cliques in with [14]; 2) for each pair of the computed maximal cliques and in , compute the maximal bicliques in the bipartite subgraph induced by the vertices in and in with [56]. The returned maximal bicliques in are the maximal balanced cliques in .
Drawbacks of . Since does not consider the uniqueness of the signed networks and processes with the techniques for the unsigned networks, it has two drawbacks:

Memory consumption. has to store all the maximal cliques in in memory. The number of maximal cliques could be exponential to the number of vertices [13], which makes unable to handle large networks.

Efficiency. In , all the maximal cliques in are enumerated and every pair of maximal cliques are explored. The time complexity of is , where / represent the time complexity of maximal (bi)clique enumeration, and is the number of enumerated maximal cliques in . Considering the maximal (bi)clique enumeration is timeconsuming and the number of maximal cliques could be very large, it is inefficient for problem.
5 A New Enumeration Framework
Revisiting , the root leading to its drawbacks discussed above is that it treats the signed network as a specific combination of two unsigned networks and utilizes the existing techniques designed for the unsigned networks. Therefore, we have to explore new techniques by considering the uniqueness of signed networks to overcome the drawbacks of and improve the efficiency of the enumeration. In this section, we present a new enumeration framework which aims to address the memory consumption problem. In next section, we further optimize the enumeration framework to improve the efficiency.
Lemma 5.1: Given a signed network , for a balanced clique in , if there is a vertex in such that and , then is also a balanced clique in .
According to Lemma 5, if we maintain a balanced clique , let be the set of vertices that are positive neighbors of all the vertices in and negative neighbors of all the vertices in , let be the set of vertices that are positive neighbors of all the vertices in and negative neighbors of all the vertices in , we can enlarge by adding vertices from and into and , respectively. Furthermore, if we update the and based on the new and accordingly and repeat the above enlargement procedure, we can obtain a maximal balanced clique when no more vertices can be added into or .
Algorithm of . Following the above idea, our algorithm for is shown in Algorithm 1. For each vertex in (line 2), we enumerate all the maximal balanced cliques containing (line 38). Note that are in the degeneracy order [48] of . We use and to maintain the balanced clique, which are initialized with and , respectively (line 3). Similarly, we also initialize and as discussed above (line 45). Moreover, we use and to record the vertices that have been processed to avoid outputting duplicate maximal balanced cliques (line 67). After initializing these six sets, we invoke procedure to enumerate all the maximal balanced cliques containing (line 8).
Procedure performs the maximal balanced clique enumeration based on the given six sets. If , , and are empty, which means current balanced clique cannot be enlarged and it is a maximal balanced clique, checks whether and satisfy the size constraint. If the size constraint is satisfied, it outputs the maximal balanced clique (line 1112). Otherwise, adds a vertex from to , updates the corresponding , , and , and recursively invokes itself to further enlarge the balanced clique (line 17). When is processed, is removed from and added in (line 18). Similar processing steps are applied on vertices in (line 1921). Variable (line 1) is used to control the order of adding new vertex into or . With the switch operation in line 14, we can guarantee that we add vertex into , then into , recursively.
Example 5.1: The enumeration procedure of can be illustrated as a search tree. Figure 3 shows part of the search tree when we conduct the on in Figure 2 through . represent different search states during the enumeration. At , we assume that we have a balanced clique with =, =, = and = at this state. We first grow search branch by adding into . At , is empty, , hence, we add into at and obtain . Now, the search branch from is finished, we return to state. Since has been explored, it is removed to , is empty now. Then, we add from to at . Due to has been found at , current result at is not maximal and can not be output. Next, we return to and find by adding into . Here, the search procedure at this search tree is finished. Other maximal balanced cliques can be found in a similar way.
Based on Algorithm 1, it is clear that the memory consumption of our enumeration framework is linear to the size of the input signed network. Therefore, the drawback of large memory consumption in is avoided.
6 Enumeration Optimization Strategies
Although Algorithm 1 addresses the memory consumption problem in , the efficiency of Algorithm 1 is disappointing. In this section, we present two optimization strategies, namely inenumeration optimization and preenumeration optimization, to further improve the efficiency of the enumeration.
6.1 InEnumeration Optimization
Branch Pruning. Branch pruning aims to prune the unfruitful branches in the search tree of Algorithm 1 to improve the performance.
Pivot Choosing. Consider the maximal balanced clique search procedure of Algorithm 1, assume that we currently have , , and , and we add a vertex from to in line 17. After finishing the search starting from , we do not need to further explore the positive neighbors of in the for loop of line 16 and the negative neighbors of in the for loop of line 19. The reasons are as follows: w.o.l.g, let be a positive neighbor of , although we skip the maximal balanced clique search starting from , these maximal balanced cliques containing must be explored by the searching branches starting or neighbors of . Therefore skipping the search starting from ’s neighbors does not affect the correctness of Algorithm 1.
In this paper, to maximum the benefits of pivot technology, we define the local degree for a vertex as , and we choose the vertex that satisfies as the pivot, where .
Candidate Selection. In the search procedure of Algorithm 1
, heuristically, search starting from a vertex with small local degree will have a short and narrow search branch, which means the search starting from the vertex will be finished very fast. Moreover, due to the search finish of the vertex, the vertex will be added into the excluded set and it can be used to further prune other search branches. Therefore, instead of adding vertices from
and into and randomly in line 16 and 19 of Algorithm 1, we add vertices in the increasing order of their local degrees.Early Termination. We consider different conditions that we can terminate the search early in Algorithm 1. For a balanced clique , the maximal possible size of () for the final maximal balanced clique is (). Based on the size constraint of , we have the following rule:

ET Rule 1: If or , we can terminate current search directly.
In Algorithm 1, we use and to store such vertices that the maximal balanced cliques containing them have been enumerated. Therefore, during the enumeration, if there exists a vertex such that and , then we can conclude that the maximal balanced cliques have been enumerated. Following this, we have our second rule:

ET Rule 2: If , s.t., and or , s.t., and , then we can terminate current search directly.
In a certain search of Algorithm 1, if all the vertices in () consist a clique formed by positive edges and every vertex in () has negative edges to all the vertices in (), then and consist a balanced clique. Then, based on Definition 3, and consist a maximal balanced clique. Therefore, we have our third early termination rule:

ET Rule 3: If , s.t., and and , s.t., and , we can output and terminate current search directly.
Note that, in order to avoid outputting duplicate maximal balanced cliques, ET Rule 3 must be applied after ET Rule 2.
Algorithm of . Utilizing the inenumeration optimization strategies, we propose the optimized algorithm . The pseudocode is omitted here due to space constraints.
Theorem 6.1: Given a signed network , the time complexity of is , where is the degeneracy number of .
6.2 PreEnumeration Optimization
In preenumeration optimization, we aim to remove the unpromising vertices and edges that not contained in any maximal balanced clique. We explore two optimization strategies based on the neighbors of a vertex and the common neighbors of an edge.
Vertex Reduction. To reduce the size of a signed network, we first consider the neighbors of each vertex , i.e., and to remove the unpromising vertices. We first define:
Definition 6.1: (()signed core) Given a signed network , two integers and , a ()signed core is a maximal subgraph of , s.t., , .
Lemma 6.1: Given a signed network and threshold , a maximal balanced clique satisfying the size constraint with is contained in a signed core.
Therefore, in order to compute the maximal balanced cliques in a given signed network with integer , we only need to compute the maximal balanced cliques in the corresponding signed core of . The remaining problem is how to efficiently compute the signed core. We propose a linear algorithm to address this problem.
Algorithm of . Based on Definition 6.2, to compute the signed core in the signed network , we only need to identify the vertices with or and remove them from . Due to the removal of such vertices, more vertices will violate the degree constraints, we can further remove these vertices until no such kind of vertices exist in .
Theorem 6.2: Given a signed network and an integer , the time complexity of is .
Edge Reduction. In this part, we explore the opportunities to remove unpromising edges with respect to by considering the common neighbors of an edge formed by different types of edges. Specifically, for a positive/negative edge , we define the edge common neighbor number:
Definition 6.2: (Edge Common Neighbor Number) Given a signed network , for a positive edge , we define:
for a negative edge , we define:
Figure 4 shows the different types of common neighbors used in Definition 4. For a positive edge , Figure 4 (a) and (b) show the common neighbor used in and , respectively. For a negative edge , Figure 4 (c) and (d) show the common neighbor used in and , respectively. Note that is undirected and every edge is stored once in . Based on Definition 4, we have the following lemma:
Lemma 6.2: Given a signed network and an integer , let be the maximal subnetwork of s.t.,

;

;
then, every maximal balanced clique in satisfying the size constraint with is contained in .
Algorithm of . With Lemma 4, in order to enumerate the maximal balanced cliques in a given signed network with respect , we only need to keep the edges in shown in Lemma 4 and the positive/negative edges not in can be safely pruned. We first compute and for each positive edge of and and for each negative edge of . Following Lemma 4, for each positive edge such that or , we remove . After that, we decrease the corresponding edge common neighbor numbers that have been changed due to the removal of for the edge incident to based on Definition 4. It’s similar to negative edges. The algorithm terminates when all the edges satisfy conditions in Lemma 4.
Theorem 6.3: Given a signed network , an integer , the time complexity of is .
7 Maximum Balanced Clique Search
Maximum clique search problem is a fundamental and hot research topic in graph analysis. In this section, we study the maximum balanced clique search problem.
7.1 A Baseline Approach
We first propose a baseline approach, namely , to compute the maximum balanced clique in the input graph. We continuously enumerate the maximal balanced cliques in the input graph and maintain the maximum balanced clique found so far. For each search branch (), if , where , we can terminate the branch. When the enumeration finishes, it is easy to verify that is the maximum balanced clique.
Drawbacks of . Although the straightforward approach can find the maximum balanced clique, the complexity of is the same as that of in the worst case. The search space of is huge. In details, the drawbacks of are twofold.

Lack of rigorous size constraints for and . Given a signed graph , during the search process, only holds the size constraint for each search branch. However, when is small, most of search branches have larger than which causes the fail of size constraint for most search branches. Unfortunately, as our algorithm constantly searches larger result than at present, the value of is gradually increasing from a small value, which makes has to search the result with large search space.

Massive invalid search branches. Although the search branches meet the size constraint with , the structure between and maybe sparse which will generates invalid search branches. Hence, during the search process, more pruning techniques is needed urgently. Moreover, the optimization strategies based on in , like vertex reduction and edge reduction, are limited here, as usually has size much larger than . Therefore, the remaining graph after reduction is still huge on largescale signed network.
Main idea. In the further work, we aim to improve the efficiency of our algorithm.

To address the first drawback of lacking of rigorous size constraints for and , we can propose and as the lower bounds for and , respectively. Then, we can get the balanced cliques with and within narrow search space (assume ). Under different value of and , the search space is split into multiple partitions. Moreover, with initializing or as large value, we can search balanced cliques with large size as priority. Under large and rigorous bounds and , the search space can be significantly reduced.

To address the second drawback of massive invalid search branches, regarding to bounds , and , we can propose optimizations to forecast the size of balanced clique found in the current search branch to avoid invalid search branches and remove redundant vertices from candidates. Moreover, we can extend the vertex reduction and edge reduction of with new bounds to prune more useless vertices and edges.
7.2 Search Space Partitionbased Framework
To improve the efficiency of our approach, regarding to the first drawback of , in this subsection, we propose a new maximum balance clique search framework  with two lower bounds and for and . Given certain value and , a search region is denoted as , the maximum balanced clique found in it should satisfies and if (otherwise, swap and ). Under different value of and , the whole search space can be divided into several search regions. In each search region, we keep searching larger result than at present. When all search regions are explored, the final result can be found.
As our main idea, to search the result with large size as priority, for the first search region , is initialized as a large integer value. Obviously, as value is large, benefited from the strict size constraint, most of search branches of are ineligible now. Hence the result can be found quickly in this search region. Besides, to obey the size threshold , we make . Then, to cover the whole search space, for the later search regions, we keep increasing and decreasing until . In another word, for , and .
Here, we first assign the possible maximum value to , we have the following lemma:
Lemma 7.1: Given a signed network , for every balanced clique, we have , where is the degeneracy number of .
Proof.
Based on Lemma 7.2, we assign to the first search region . Then, in the later search region, we continue to seek larger balanced clique than the current one. However, not every search region can find a valid result. To skip invalid search regions, we have the following lemma:
Lemma 7.2: Given a signed network , the maximum balanced clique found in the th search region is denoted by . Then, for the next search region , we have .
Proof.
We prove it by contradiction. Following the th search region, in the next search region , if we get a larger balanced clique than . Based on our search framework, we have , otherwise, will be found in the th search region rather than the th search region. Now, we assume . Combining with , we have . Obviously, it is against with our premise that is larger than . Therefore, the assumption for does not hold. We get , i.e., . ∎
Based on Lemma 7.2, after the th search region, the search regions with can be skipped directly.
Algorithm of . Following the above idea, the new maximum balanced clique search algorithm  is shown at Algorithm 2. Given a signed network and size threshold , we first compute the degeneracy number of (line 1). We initialize ,