1 Introduction
Consider an undirected graph with the set of vertices and the set of edges. Each edge is associated with a weight that indicates the similarity between its two end vertices–the lager the weight, the more “similar” the two vertices are. The hard graph clustering (HGC) problem is to create distinct partitions (clusters, or, communities) of the set of vertices according to their similarities, i.e., to form , where , and for all , . For a thorough literature review of the graph clustering problems, see, e.g., [Schaeffer2007], and for fast algorithms for largescale networks, see, e.g., [Girvan and Newman2002, Clauset et al.2004, Rosvall and Bergstrom2008]. For datasets, see, e.g., SNAP network datasets [sna] and SNAP biomedical datasets [Marinka Zitnik and Leskovec2018].
The soft graph clustering (SGC) problem, (also known as fuzzy graph clustering), on the other hand, allows clusters to have overlaps. A vertex may be a member of more than one cluster. There are numerous applications of SGC, such as: brain research, social network research, natural language processing, citation, and collaboration networks, and so on. A precise problem definition of the SGC varies and is dependent on the application, and sometimes it may not be possible to provide a precise problem definition.
The subject of study in this paper considers the combinatorial optimisation problem where we are required to determine: 1) the composition of each of the clusters; 2) for each vertex that belongs to more than one cluster, how the membership is distributed amongst the clusters (we denote this by , for Vertex in Cluster , hence for all ). We consider the case that an equal balance of the cluster total vertex memberships is desirable, and that not all vertices are required to be in a cluster. We consider two equally important objectives: 1) to minimize the sum of intercluster edge weights (cut across clusters); and 2) to maximize the sum of intracluster edge weights (cluster association).
1.1 Literature review
There are a number of existing soft clustering algorithms, each designed to suit different applications see, e.g., CFinder of [Palla et al.2005], (see also [Derényi et al.2005], [Adamcsek et al.2006]), the MaxMax Algorithm of [Hope and Keller2013], the WATSET methods of [Ustalov et al.2018] for NLP, the Chinese Whisper method of [Biemann2006], Betweennessbased method of [Pinney and Westhead2006], and the Purifying and Filtering the Coupling Matrix approach of [Liu and Foroushani2016]. Of these methods, [Biemann2006], [Pinney and Westhead2006], [Liu and Foroushani2016], and [Ustalov et al.2018] are designed for unweighted graphs only (i.e., graphs with unit edge weight).The MaxMax Algorithm is designed for weighted undirected graphs. For unweighted graphs, however, it will return a trivial solution–each connected component of the graph will be a cluster. The CFinder is based on the finding of clique neighbourhoods. The mixedinteger linear programming method we propose in this paper is able to accomodate both weighted and unweighted graphs, with a small modification required for the latter. In the preliminary experiments section, we will compare and contrast the different methods.
We are not aware of any mixedinteger linear programming (MILP) models for the SGC problems. There are, however, MILP models for other graph clustering, machine learning, and data classification problems. The article
[Bertsimas and Shioda2007] presents MILP formulations for classification and regression. The idea for classification, e.g., is to partition Class 1 points intodisjoint subsets by finding the hyperplanes that describe the partitioning polyhedrons such that no Class 0 points can be expressed as a convex combination of the Class 1 points in each partition. In general clustering problems,
[Sağlam et al.2006] proposes a MILP formulation where one wishes to partition a set of data set into(a predetermined number of) clusters. The objective is to minimize the maximum diameter of the generated clusters in order to obtain evenly compact clusters. Essentially the method is an IPbased heuristic method, with some variables fixed by the solution of maximal independent set of size
, where each member of this set is a seed member in the clusters. The IP model (which is in fact a bilinear model, but linearized using standard linearization strategies) is then solved to obtain an optimal solution to the general clustering problem. Other MILPbased work can be found in, e.g., [Gilpin et al.2013] and [Ye2007]for hierarchical clustering. The latter presents an application in recommendation systems. In Clique Covering Problem (CCP), an NPhard combinatorial optimisation problem where an undirected graph is to be partitioned to form complete subgraphs,
[Miyauchi et al.2018] proposes a compact ILP formulation for a relaxed problem, as well as a postoptimization repair procedure and a proof of optimality for the final solution to the original problem.1.2 Contribution of the paper
As far as we aware, this paper is the first to propose a methodology for SGC that i) deals with undirected graphs with general integer edge weights and truely takes the values of into the optimization process in the way that the larger the value is, the more favourably it will be considered, and at the same time, also deals with unweighted graphs (i.e., graphs with unit edge weight); ii) simultaneously allocates membership proportion (the values) for vertices that lie in multiple clusters; and iii) it enforces an equal balance of the clusters that the sum of vertex memberships over all clusters are roughly the same. We propose an approach that is based on a polynomialsize MILP model, beginning with a small value for –we enforce that the graph has at least clusters. One can apply an adaptive approach to find the best value of iteratively, but the focus of this research is to solve an instantaneous SGC problem with a given .
The method of [Palla et al.2005] requires obtaining , and the graph is subsequently clustered by finding clique neighbourhoods on an unweighted graph . Our method does not require the finding of cliques, and it takes the values of into account during optimization. Comparing with the method of [Hope and Keller2013] (MaxMax), for unweighted graphs, if the graph is connected, then MaxMax will produce only one cluster which is the entire graph. Our method, however, can deal with unweighted graphs by converting them into weighted ones via a simple transformation.
In Section 2.1, we present our basic MILP model by considering a number of standard requirements for the SGCP. In Section 2.2, we discuss our strategies for graph connectivity. In Section 2.3, we discuss two objectives: 1) minimizing the total intercluster cut, and 2) maximize the total intracluster association. In Section 3, we present preliminary numerical results. We then conclude our findings and discuss future research directions in Section 4.
2 A mixedinteger linear programming formulation
2.1 The basic model
We first introduce the notation used in this paper. Let:

be the adjacency matrix of ;

the maximum edge weight;

be the number of clusters;

be the set of clusters;

be a binary decision variable with indicating Vertex is a member of Cluster ;

be a continuous decision variable indicating the membership of Vertex in Cluster ;

be the cut between clusters and ;

a predetermined minimum membership if a vertex is a member of a cluster;

a predetermined tolerance equal balance of cluster membership;

a predetermined maximum overlap factor.
Now we introduce the constraints. First, we have a set of Membership Constraints. The membership of Vertex in Cluster can only be nonzero if it is a member of , and when it is, the membership must be no less than a predetermined value.
(1)  
(2) 
Let , for each
, be an auxiliary binary variable with
indicating is in at least one of the clusters and otherwise. We require that the sum of memberships for any vertex over all clusters is exactly 1 if the vertex is a member of at least one cluster and 0 otherwise.(3)  
(4)  
(5) 
We consider an Equal Balance Requirement where the sum of memberships in all clusters are “roughly” the same. As far as we aware, this is the first method that considers such a requirement. This requirement can be modelled as below.
(6) 
Next, we consider the Overlap Cardinality Constraints. Let be auxiliary binary decision variable such that if and only if is a member of both clusters , , i.e., . We have that:
(7)  
(8)  
(9) 
Given any pairs of clusters , with , the number of vertices that are in both clusters cannot be larger than a predetermined fraction, , of the cardinality of either of the clusters.
(10)  
(11) 
To calculate the Intercluster Cuts between and , we require auxiliary binary variables and nonlinear terms. First, let be a binary variable such that if both of are in the intersection of clusters and .
(12)  
(13)  
(14) 
We then use a binary variable to indicate the existence of an edge (cut) (i.e., ) with in and in , but not both in the intersection of and (otherwise the “cut” should not be counted). The constraints are as below. For each pair of distinct vertices and each pair of distinct clusters , we have that:
(15)  
(16)  
(17)  
(18)  
(19) 
Now, the cut between two distinct vertices across two distinct clusters , if edge exists, is defined by: , for
(20) 
The terms and are bilinear, and can be linearized by introducing auxiliary nonnegative continuous variables and and the following constraints. For each , we have that:
(21)  
(22)  
(23)  
(24)  
(25)  
(26) 
The cut is calculated by the following linear term.
(27) 
Now, we consider the Intracluster Association calculations. Let be an auxiliary binary variable with if are both in and that the edge exists in . The constraints are give by:
(28)  
(29)  
(30)  
(31) 
The intracluster association of is given by
We linearize the association using auxiliary continuous variables and to capture the memberships of two vertices in the same cluster should an edge exists between them. I.e., when , , otherwise, . (Similarly for ). The constraints are as below.
(32)  
(33)  
(34)  
(35)  
(36)  
(37) 
The association within a cluster , denoted by , is calculated as follows:
(38) 
2.2 Cluster connectivity
Clusters are required to be connected, it is likely that this can only be achieved with exponentially many variables and solved using a branchandprice approach, or with exponentially many constraints and solved using a branchandcut approach. Our approach here is not an exact one, as we wish to keep the formulation compact (i.e., polynomial in size). We derive a number of constraints that capture a few conditions for graph connectivity that are necessary but not sufficient. When there is a connectivity violation, violation elimination constraints can be added in a lazy fashion. (Lazy constraints is a technical term in integer programming–hard (and often exponentially many) constraints are relaxed, and are only added when the current integer optimal solution to the relaxed problem violates them–usually only a very small number of them are violated–the problem is then reoptimized, and the procedure recurs until there is no more violated hard constraints). In our preliminary test where we used randomly generated undirected graphs (see Preliminary Numerical Results section), most of the problem instances produced connected clusters.
First of all, the number of edges in a connected undirected graph cannot be smaller than the cardinality of the graph minus one. We define to be a span variable that can only be one when and are both in Cluster and that the edge exists in . We have that:
(39)  
(40)  
(41) 
and that
(42) 
We also require that all vertices in each cluster must be connected to at least one other vertex in the same cluster.
(43) 
However, Constraints (39)–(43) are not enough to eliminate multiple loops within a cluster. A cluster may contain vertices , but the constraints cannot eliminate the formation of subgraphs and within the cluster.
Therefore, we borrowed the time constraints idea for Asymmetric Travelling Salesman Problem (ATSP) subtour elimination ([Miller et al.1960]). Let , , be a decision variable indicating the “time of arrival” at Vertex . Suppose is a strict subset of a cluster , to prevent the relevant span variables from forming a loop, we require that if and only if , for each pair of distinct , and each . We have that:
(44)  
(45) 
E.g., consider the vertices a strict subset of , a loop with edges a violation of the time constraints, as cannot be equal to and simultaneously. Constraints (44) and (45) can eliminate certain types of loops formed by the span variables, but not all. In any case, adding them will give the span variables a better chance at making a full span in a cluster and thus the connectivity.
Notice that the time constraints do not cut off feasible solutions due to the fact that even if a variable is forced to be 0, there is no impact on the values of and , as (39)–(41) do not induce a biconditional relation. In our preliminary numerical experiments, the time constraints are expensive to implement, and often we do not have disconnected clusters with just (39)–(43) alone.
When cluster size is a hard constraint, and there exist a disconnected cluster, in the case of maximizing total association, e.g., one can consider adding the following cut in a lazy fashion and reoptimize. Let be the set of variables with a value of 1 in the optimal solution, we add:
(46) 
2.3 The objective function
Two commonly considered objective functions are: 1) minimize the sum of intercluster cuts and 2) maximize the sum of intracluster association given by (47) and (48) below respectively.
(47) 
(48) 
Notice that it is necessary to enforce a minimum cluster size, otherwise a trivial optimal solution to (47) is to assign no vertices to any clusters. Thus we have:
(49) 
for a predetermined value.
The two objectives should be considered simultaneously. Some may consider minimizing the sum of ratio of intercluster cuts and intracluster associations over all pairs of distinct clusters.
(50) 
Diagram (1) below shows how the sum of ratios of intercluster cuts and intracluster associations changes. The first data point on the left is obtained when we optimize (47) and set to be the value of in the optimal solution. We can see that the total intracluster association is also small, and so is the value of (50). The last data point on the right is obtained when we optimize (48) and we set to be the optimal objective value. We then minimize (47) with , for , .
One can, however, consider recent advances in Biobjective Integer Programming (see, e.g., [Dai and Charkhgard2018]). In fact, different applications of the SGC may have a different objective function that describes the SGC better. Besides, the structure of the graph must be taken into consideration in determining what the most appropriate objective function is. One may therefore consider applying machine learning to automate the finding of an appropriate objective function.
2.4 Edge weight transformation for unweighted graphs
For undirected graphs with , for all , the MILP does not work very well because it cannot distinguish between an edge in a sparse neighbourhood with one in a dense neighbourhood. To give favour to edges in a dense neighbourhood, we obtain new edge weights by calculating , i.e., the new edge weight of will be one plus the number of vertices that are connected to both of the end nodes of .
3 Preliminary numerical results
We compared our method with CFinder [Palla et al.2005, Derényi et al.2005, Adamcsek et al.2006] and MaxMax Algorithm [Hope and Keller2013]. We used a 21vertex instance of weighted graph. We can see that the clusters of CFinder are the clique neighbourhoods, and therefore some of the edges with heavy weight are not included, e.g., , , .
The MaxMax solution, however, have these heavyweight edges covered. However, for unweighted graphs, MaxMax will cluster the graph by connected components, so if the graph is connected, then there will be only one cluster.
The MILP solution produces not only the clusters, but also the membership proportion for each vertex that belongs to more than one cluster. None of MaxMax or CFinder provides this information. One can see clearly that the Min Cut solution can also produce a relatively balanced set of clusters in terms of total weighted memberships of the clusters when the clusters are connected. None of the existing SGC methods considered such a constraint.
In the Max Association solution below, unfortunately, because the MILP enforced three clusters, and the blue cluster is not connected, we have four clusters instead, so the equal balance cannot be guarantee in this case. In any case, the four clusters demonstrated the four strongest connections by . As for the membership proportions, take Vertex 9 as an example, since there are more heavily weighted links in the pink cluster rather than the green cluster, 0.9 of its membership is in the former, and 0.1 in the latter.
3.1 KKI instances
In this section, we applied our method on some KKI instances from https://github.com/shiruipan/graph_datasets. Since the KKI instances are unweighted, we perform the edge weight transformation described in Section 2.4.
In Table 1, column Opt presents the time taken (in seconds) for proving optimality, and the column Gap presents the gap between the MIP relative gap, (i.e., the gap between the best integer objective and the objective of the best active node on the branchandbound tree), for instances that are not solved to optimality within the given time limit (10 minutes). The column r is the ratio of total intercluster cut over total intracluster association. We used IBM ILOG CPLEX Optimization Studio v12.8 [Support2018] for solving the MILPs.
Min Cut  Max Association  

Data set  Opt  Gap  Opt  Gap  
101943635  3.66  0.00  0.00  48.57  0.00  0.02 
154181221  1.16  0.00  0.00  8.64  0.00  0.23 
157704249  39.28  0.00  0.00  118.26  0.00  0.03 
163833434  5.88  0.00  0.00  94.91  0.00  0.04 
173588135  1.68  0.00  0.00  37.56  0.00  0.13 
177992228  2.24  0.00  0.00  107.93  0.00  0.16 
237103246  34.49  0.00  0.00  600.07  0.03  0.16 
255899928  0.82  0.00  0.00  19.56  0.00  0.17 
260192529  3.62  0.00  0.00  62.78  0.00  0.14 
261892920  3.09  0.00  0.00  31.56  0.00  0.46 
276827323  10.57  0.00  0.09  30.90  0.00  0.09 
290399744  32.75  0.00  0.00  500.42  0.00  0.22 
291777744  22.55  0.00  0.00  110.44  0.00  0.01 
331032847  19.49  0.00  0.00  208.22  0.00  0.07 
361182739  75.61  0.00  0.00  600.04  0.03  0.20 
371323020  11.23  0.00  0.00  23.83  0.00  0.15 
390246945  34.21  0.00  0.00  114.55  0.00  0.13 
397247231  12.07  0.00  0.00  101.97  0.00  0.10 
410452325  6.11  0.00  0.00  30.87  0.00  0.18 
427507528  274.13  0.00  0.00  600.05  0.00  0.15 
436273038  28.72  0.00  0.00  64.47  0.00  0.04 
521690830  37.06  0.00  0.00  113.23  0.00  0.14 
634660527  600.05  0.93  0.00  600.05  0.00  0.32 
712925834  4.72  0.00  0.00  70.97  0.00  0.17 
741561728  5.49  0.00  0.00  106.67  0.00  0.30 
777430520  0.95  0.00  0.00  13.82  0.00  0.10 
826335142  63.80  0.00  0.00  208.89  0.00  0.11 
3.2 Instances from a random problem generator
We have generated 10 problem classes with 5 instances for each problem class. The problem instances were generated using a random problem generator where number of vertices on the undirected graph, (), density of graph , and an upper bound on the edge weight are taken as user input. The edge weights are generated by each given a number chosen uniformly randomly between . In Table 2, the names of the problem classes are in the form of: N followed by the value of , d followed by the density of , and then followed by the value of . The column CON
is the percentage of clusters that are connected in each problem class. In the columns where two numbers were given for each problem class, the first row presents the average value over 5 problem instances, and the second row presents the value of the standard deviation. The bracket underneath each problem class indicates the number of instances that are solved to optimality versus the number of instances that are not.
3.3 Preliminary testing
We experimented with instances with maximum edge weight and , clusters, and . From the results of Table 2, we can see that the computation time grows exponentially as the size of the problem grows, or the density of the graph grows. For problem instances that are not solved to optimality, the MIP gaps are large. However in all problem instances, feasible solutions are found reasonably quickly. We can see that the computation time for maximizing total intracluster association is substantially longer than minimizing total intercluster cuts. For problem instances that are not solved to optimality, however, the former has a smaller MIP gap whilst the latter can solve larger problem instances. With an objective of maximizing total association, we did not include Constraint (49) as the objective will drive as much edges used as possible. We cannot obtain a conclusive remark on the effects of in terms of computation time.
Min Cut  Max Association  
Data set  Opt  Gap  r  Con  Opt  Gap  r  Con  
N15d015M50  0.939  0  0.215  100  2.347  0  0.769  100  
(5/0)  0.269  0  0.195  (5/0)  0.987  0  0.155  
N15d025M50  2.012  0  0.219  100  14.60  0  0.784  93.33  
(5/0)  0.975  0  0.150  (5/0)  12.85  0  0.107  
N15d05M50  78.46  0  0.393  100  412.3  0.074  0.912  100  
(5/0)  43.20  0  0.317  (4/1)  85.48  0  0.198  
N20d015M50  2.612  0  0.053  100  54.90  0  0.789  100  
(5/0)  1.348  0  0.047  (5/0)  32.11  0  0.193  
N20d025M50  66.07  0  0.146  100  252.2  0.067  0.876  100  
(5/0)  51.50    0.110  (1/4)  0  0.017  0.177  
N20d05M50    0.686  0.484  100    0.229  1.069  100  
(0/5)    0.176  0.152  (0/5)    0.033  0.208  
N30d015M50  31.38  0.826  0.025  100    0.044  0.817  100  
(4/1)  11.07  0  0.048  (0/5)    0.021  0.073  
N30d025M50    0.969  0.224  100    0.146  0.862  100  
(0/5)    0.021  0.110  (0/5)    0.028  0.102  
N50d015M50    1  0.142  93.33    0.176  0.937  100  
(0/5)      0.095  (0/5)    0.032  0.066  
N50d025M50    1  0.683  93.33    0.462  1.078  100  
(0/5)    0  0.128  (0/5)    0.049  0.078  
N15d015M100  0.540  0  0.089  80.00  3.193  0  0.484  80.00  
(5/0)  0.492  0  0.123  (5/0)  5.069  0  0.476  
N15d025M100  3.226  0  0.218  100  44.01  0  0.886  100  
(5/0)  2.417  0  0.143  (5/0)  26.46  0  0.149  
N15d05M100  35.86  0  0.359  100  342.5  0.116  1.070  100  
(5/0)  20.11  0  0.202  (3/2)  95.53  0.021  0.184  
N20d015M100  4.897  0  0.096  86.66  41.86  0  0.789  73.33  
(5/0)  3.398  0  0.105  (5/0)  15.81  0  0.092  
N20d025M100  71.12  0  0.102  100  261.9  0.049  0.922  100  
(5/0)  62.69  0  0.076  (1/4)  0  0.018  0.146  
N20d05M100    0.709  0.494  100    0.210  1.052  100  
(0/5)    0.064  0.155  (0/5)    0.014  0.078  
N30d015M100  46.06  0  0.006  93.33  515.5  0.037  0.731  93.33  
(5/0)  70.07  0  0.007  (1/4)  0  0.024  0.123  
N30d025M100    0.953  0.168  100    0.155  0.934  100  
(0/5)    0.042  0.063  (0/5)    0.022  0.092  
N50d015M100    1  0.129  93.33    0.187  0.905  100  
(0/5)    0  0.089  (0/5)    0.099  0.095  
N50d025M100    1  0.592  93.33    0.448  1.312  100  
(0/5)    0  0.147  (0/5)    0.080  0.185 
Min Cut  Max Association  
Data set  Opt  Gap  r  Con  Opt  Gap  r  Con  
N15d015M50  0.196  0  0.061  100  0.203  0  0.246  100  
(5/0)  0.067  0  0.048  (5/0)  0.058  0  0.041  
N15d025M50  0.180  0  0.014  90.00  0.256  0  0.264  100  
(5/0)  0.038  0  0.027  (5/0)  0.066  0  0.079  
N15d05M50  2.218  0  0.090  100  1.660  0  0.313  100  
(5/0)  1.594  0  0.032  (5/0)  0.670  0  0.056  
N20d015M50  0.211  0  0.014  90.00  0.699  0  0.214  100  
(5/0)  0.067  0  0.032  (5/0)  0.288  0  0.051  
N20d025M50  1.181  0  0.011  100  2.537  0  0.283  100  
(5/0)  0.680  0  0.015  (5/0)  1.288  0  0.064  
N20d05M50  46.63  0  0.122  100  61.40  0  0.352  100  
(5/0)  29.13  0  0.064  (5/0)  26.19  0  0.037  
N30d015M50  1.548  0  0.000  80.00  12.10  0  0.248  100  
(5/0)  0.461  0  0.000  (5/0)  4.455  0  0.051  
N30d025M50  63.92  0  0.023  100  169.4  0.006  0.323  100  
(5/0)  71.98  0  0.015  (4/1)  185.0  0  0.050  
N50d015M50  0  0.887  0.023  100  0  0.082  0.287  100  
(0/5)  0  0.162  0.015  (0/5)  0  0.014  0.032  
N50d025M50  0  0.997  0.111  100  0  0.177  0.354  100  
(0/5)  0  0.002  0.028  (0/5)  0  0.014  0.055 
Min Cut  Max Association  
Data set  Opt  Gap  r  Con  Opt  Gap  r  Con  
N15d015M50  3.453  0  0.566  75.00  92.08  0  1.614  65.00  
(5/0)  2.826  0  0.343  (5/0)  68.72  0  0.260  
N15d025M50  75.79  0  0.252  75.00  300.5  0.053  1.553  75.00  
(5/0)  100.2  0  0.141  (2/3)  271.8  0.049  0.358  
N15d05M50  0  0.617  0.863  75.00  0  0.232  1.737  75.00  
(0/5)  0  0.090  0.294  (0/5)  0  0.015  0.236  
N20d015M50  83.63  0  0.104  75.00  41.54  0.035  1.287  65.00  
(5/0)  128.8  0  0.105  (1/4)  0  0.004  0.319  
N20d025M50  0  0.724  0.322  75.00  0  0.136  1.821  75.00  
(0/5)  0  0.252  0.169  (0/5)  0  0.020  0.143  
N20d05M50  0  0.988  1.112  75.00  0  0.346  2.201  75.00  
(0/5)  0  0.007  0.446  (0/5)  0  0.057  0.306  
N30d015M50  237.4  0.943  0.060  75.00  0  0.080  1.670  75.00  
(1/4)  0  0.085  0.070  (0/5)  0  0.029  0.180  
N30d025M50  0  1  0.365  75.00  0  0.248  1.988  75.00  
(0/5)  0  0  0.252  (0/5)  0  0.034  0.267  
N50d015M50  0  1  0.794  65.00  0  0.447  1.955  70.00  
(0/5)  0  0  0.468  (0/5)  0  0.102  0.162 
Min Cut  Max Association  
Data set  Opt  Gap  r  Con  Opt  Gap  r  Con  
N15d015M50  0.574  0  0.075  100  0.698  0  0.363  100  
(5/0)  0.305  0  0.100  (5/0)  0.206  0  0.334  
N15d025M50  0.482  0  0.000  100  1.403  0  0.140  100  
(5/0)  0.389  0  0.000  (5/0)  0.159  0  0.096  
N15d05M50  24.29  0  0.258  100  116.2  0  0.391  100  
(5/0)  12.43  0  0.174  (5/0)  33.96  0  0.116  
N20d015M50  1.019  0  0.000  100  5.951  0  0.034  100  
(5/0)  0.737  0  0.000  (5/0)  8.141  0  0.025  
N20d025M50  12.31  0  0.018  100  149.4  0  0.237  100  
(5/0)  10.26  0  0.016  (5/0)  199.2  0  0.083  
N20d05M50  55.45  0.439  0.277  100  0  0.713  0.620  100  
(1/4)  0  0.168  0.228  (0/5)  0  0.080  0.281  
N30d015M50  3.175  0  0.000  93.33  357.0  0.813  0.086  93.33  
(5/0)  2.334  0  0.000  (4/1)  196.5  0  0.067  
N30d025M50  47.12  0  0.001  100  0  0.959  0.265  100  
(5/0)  53.50  0  0.002  (0/5)  0  0.026  0.133  
N50d015M50  209.7  1  0.007  93.33  0  1  0.313  100  
(4/1)  212.0  0  0.016  (0/5)  0  0  0.137  
N50d025M50  0  1  0.219  100  0  1  0.903  93.33  
(0/5)  0  0  0.109  (0/5)  0  0  0.317 
4 Conclusions and future research directions
We proposed and developed a polynomialsize MILP model for the SGCP. As far as we aware, this is the first approach that simultaneously allocates membership proportion (the values) for vertices that lie in multiple clusters, and that enforces an equal balance of the clusters so that the sum of vertex memberships over all clusters are roughly the same. Compared to CFinder ([Palla et al.2005], [Derényi et al.2005], [Adamcsek et al.2006]), the clusters found in our method are not limited to clique neighbourhoods. Compared to MaxMax ([Hope and Keller2013]), our method can produce nontrivial clusters even for a connected unweighted graph. An obvious future research direction is to perform a thorough numerical experiment on all parameters used in the model. One can consider alternative formulations, e.g., a branchandcut approach, using constraints to cut off infeasible solutions (namely, solutions with disconnected clusters); or, a branchandprice approach, using exponentially many variables each representing a feasible cluster (in which case connectivity is guaranteed). Even though the cut separation and column generation subproblems themselves are expected to be NPhard, heuristic approaches can be derived for speedy execution. Further, one may consider an adaptive approach to find the optimal number of clusters, instead of iteratively optimising over different values of .
References
 [Adamcsek et al.2006] Balázs Adamcsek, Gergely Palla, Illés Farkas, Imre Derényi, and Tamás Vicsek. Cfinder: Locating cliques and overlapping modules in biological networks. Bioinformatics (Oxford, England), 22:1021–3, 05 2006.
 [Bertsimas and Shioda2007] Dimitris Bertsimas and Romy Shioda. Classification and regression via integer optimization. Oper. Res., 55(2):252–271, March 2007.
 [Biemann2006] Christian Biemann. Chinese whispers  an efficient graph clustering algorithm and its application to natural language processing problems. 2006.
 [Clauset et al.2004] Aaron Clauset, M. E. J. Newman, and Cristopher Moore. Finding community structure in very large networks. Phys. Rev. E, 70:066111, Dec 2004.
 [Dai and Charkhgard2018] Rui Dai and Hadi Charkhgard. A twostage approach for biobjective integer linear programming. Oper. Res. Lett., 46:81–87, 2018.
 [Derényi et al.2005] Imre Derényi, Gergely Palla, and Tamás Vicsek. Clique percolation in random networks. Phys. Rev. Lett., 94:160202, Apr 2005.
 [Gilpin et al.2013] Sean Gilpin, Siegried Nijssen, and Ian Davidson. Formalizing hierarchical clustering as integer linear programming, 2013.
 [Girvan and Newman2002] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.
 [Hope and Keller2013] David Hope and Bill Keller. Maxmax: A graphbased soft clustering algorithm applied to word sense induction. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, pages 368–381, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
 [Liu and Foroushani2016] Ying Liu and Amir Foroushani. Pfc: An efficient soft graph clustering method for ppi networks based on purifying and filtering the coupling matrix. In DeShuang Huang, Vitoantonio Bevilacqua, and Prashan Premaratne, editors, Intelligent Computing Theories and Application, pages 247–257, Cham, 2016. Springer International Publishing.
 [Marinka Zitnik and Leskovec2018] Sagar Maheshwari Marinka Zitnik, Rok Sosič and Jure Leskovec. BioSNAP Datasets: Stanford biomedical network dataset collection. http://snap.stanford.edu/biodata, August 2018.
 [Miller et al.1960] C. E. Miller, A. W. Tucker, and R. A. Zemlin. Integer programming formulation of traveling salesman problems. Journal of the ACM, 7(4):326–329, October 1960.
 [Miyauchi et al.2018] Atsushi Miyauchi, Tomohiro Sonobe, and Noriyoshi Sukegawa. Exact clustering via integer programming and maximum satisfiability, 2018.
 [Palla et al.2005] Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, June 2005.
 [Pinney and Westhead2006] John W. Pinney and David R. Westhead. Betweennessbased decomposition methods for social and biological networks. 2006.
 [Rosvall and Bergstrom2008] Martin Rosvall and Carl T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008.
 [Sağlam et al.2006] Burcu Sağlam, F. Sibel Salman, Serpil Sayın, and Metin Türkay. A mixedinteger programming approach to the clustering problem with an application in customer segmentation. European Journal of Operational Research, 173(3):866 – 879, 2006.
 [Schaeffer2007] Satu Elisa Schaeffer. Graph clustering. Computer Science Review, 1(1):27 – 64, 2007.
 [sna]
 [Support2018] IBM Support. Cplex optimization studio v12.8. http://www01.ibm.com/support/docview.wss?uid=swg27050618, 2018.
 [Ustalov et al.2018] Dmitry Ustalov, Alexander Panchenko, Chris Biemann, and Simone Paolo Ponzetto. Localglobal graph clustering with applications in sense and frame induction. CoRR, abs/1808.06696, 2018.
 [Ye2007] M. Ye. An integer programming clustering approach with application to recommendation systems. Master’s thesis, Iowa State University, 2007.