Maximum balanced biclique problem. Given a bipartite graph , a biclique is a subgraph of such that -tuple , . When , is a balanced biclique. One of the fundamental but significant biclique problems is the maximum balanced biclique (MBB) problem, i.e., given a bipartite graph , finding a balanced biclique with the maximum number of vertices.
Significance. The MBB problem is significant across various disciplines. It has extensive real applications for very-large-scale integration (VLSI) including programmable logic array folding [ravi1988complexity], defect tolerance chips designing [al2007defect, tahoori2006application], etc. Furthermore, it plays principal roles for analyzing biological data since an MBB is an important instance of bicluster [cheng2000biclustering, yang2005improved]. Recently, it also draws significant attention for discovering interactions between proteins [bustamam2020application, mukhopadhyay2014incorporating, dey2019graph, kaloka2019pols].
Given the significance of the MBB problem, it has been studied extensively. Since the MBB problem has been proven to be NP-hard [garey1979computers] and NP-hard to approximate within factor for every [manurangsi2018inapproximability], most existing algorithms [ZHOU201986, WU2015693, LI2020104922, wang2018new] for finding an MBB are heuristic algorithms while very few works were dedicated for finding an exact MBB, other than the work [ZHOU2018834].
Our approach. In this paper, we focus on finding an exact MBB. Surprisingly, we show that an exact MBB can be discovered extremely fast, by making the benefit of the characteristics of the bipartite graphs of real applications, despite the NP-hardness of the MBB problem. We first make a breakthrough in solving the MBB problem for dense bipartite graphs and then devise an efficient algorithm for large sparse bipartite graphs by taking the advantages of our algorithm for dense bipartite graphs. The prominent motivations and intuitions of our proposed algorithms are introduced as follows.
Novel algorithm for dense bipartite graphs. We observe that bipartite graphs are quite dense in applications such as VLSI design. Finding exact results for these applications would significantly improve the robustness of the designed circuit. However, the existing exact MBB algorithm [ZHOU2018834] cannot find the exact result within a few hours for dense bipartite graphs with just hundreds of vertices. This motivates us to devise novel techniques for dealing with dense bipartite graphs, leading to a novel algorithm denoted as denseMBB. denseMBB has a time complexity of where is the number of vertices in . To the best of our knowledge, denseMBB is the first MBB algorithm for speeding up MBB search in dense bipartite graphs with explicit time complexity. The intuitions of denseMBB are below. We propose a polynomial algorithm for finding an exact MBB when a bipartite graph is sufficiently dense. Then, triviality last branching strategy is given to avoid enumerating on subgraphs where our proposed polynomial algorithm can apply. In fact, denseMBB can find an MBB in near polynomial time for a bipartite graph where is around to of , which is quite typical for defect tolerance chips designing [al2007defect, tahoori2006application]. This is because, when bipartite graphs are dense, our proposed novel techniques make the search converge to polynomially solvable subgraphs in near constant steps.
Novel algorithm for large sparse bipartite graphs. Given the promising time complexity of denseMBB, it is natural to ask whether denseMBB can be applied to large sparse bipartite graphs that are typical for applications such as analyzing biological data. Applying denseMBB to large sparse bipartite graphs directly is inefficient in practice given the fact that the number of vertices could be extremely large and optimizations for dense bipartite graph cannot significantly reduce when bipartite graphs are sparse. We propose a novel algorithm sparseMBB for dealing with large sparse bipartite graphs with time complexity of , where is a novel bipartite sparsity parameter proposed by us and is only a few hundreds for large sparse bipartite graphs having millions vertices. The intuitions of sparseMBB are as follows. Using our proposed techniques, sparseMBB transforms a large bipartite graph into a limited number of small but dense subgraphs with size up to . After that, our proposed denseMBB is applied to each small but dense subgraphs, which makes near polynomial in practice. Apart from theoretically promising, sparseMBB is very fast practically. In fact, sparseMBB can find an MBB within a few seconds for million-vertex bipartite graphs.
We highlight our principal contributions below.
Novel bipartite sparsity measurement: bipartite degeneracy is proposed for measuring the bipartite sparsity of a bipartite graph, denoted as .
Algorithms with better time complexity: our proposed algorithms find an exact result with time complexity of for dense bipartite graphs and for large sparse bipartite graphs.
Practically fast algorithms: we conduct extensive experiments on synthetic and real datasets. Our algorithms are up to several order faster than the state-of-the-art algorithm and a number of non-trivial baselines.
Roadmap. The remaining of the paper is organized as follows. Section 2 formally defines the MBB problem. Section 3 discusses the state-of-the-art algorithm. Section 4 introduces our novel algorithm, denseMBB for dense bipartite graphs. Section 5 introduces our novel algorithm, sparseMBB for large sparse bipartite graphs. Section 6 experimentally evaluates the efficiency of our proposed algorithms. Section 7 discusses related works and Section 8 concludes the paper.
2 Preliminary and Problem formulation
Frequently used notations are summarized in Table 1.
Bipartite graph. A bipartite graph is a graph in which vertices can be partitioned into two sets and such that no edge joins two vertices in the same set. In this paper, we denote a bipartite graph as . Given a vertex , we use to denote its neighbours in and to denote the maximum degree of .
Core number [batagelj2003m]. The core number of a vertex in , denoted by , is the largest possible integer such that there exists a subgraph containing and .
Degeneracy. The maximum core number of is also called the degeneracy of , denoted as .
Biclique. Given , a pair of vertex sets is a biclique if , .
For instance, given the bipartite graph shown in Figure 1(b), , induced subgraphs are bicliques.
Balanced biclique. A biclique is a balanced biclique, if .
For instance, balanced bicliques in Figure 1(b) include , , , etc.
Maximum balanced biclique problem. Given a bipartite graph (), find a balanced biclique such that there is no other balanced biclique with .
3 State of the Art
In this section, we revisit the state-of-the-art algorithm [ZHOU2018834], denoted by ExtBBClq, for solving the MBB problem exactly.
ExtBBClq is a branch and bound algorithm with upper bound based pruning. The branch and bound part is inspired by the enumeration proposed in [10.1007/978-3-319-07046-9_16], which is an adaption from the maximal clique enumeration algorithm [10.1145/362342.362367]. The algorithm starts the branch and bound procedure for enumerating all bicliques with vertices in non-increasing order according to their global degrees. To efficiently compute an MBB
, when branching at a vertex, an upper bound estimation is applied to prune non-promising branches.
The upper bounds used in [ZHOU2018834] are summarized below. Given a vertex , its upper bound is defined as the largest integer such that there are vertices in where each of the vertices has at least common neighbours with . The upper bound for a vertex in is defined similarly. Therefore, given a vertex , its tight upper bound is defined as the largest integer such that there exists vertices in with upper bound at least . The upper bound for every vertex is precomputed due to the high time complexity of computation. When branching at , if is less than the maximum balanced biclique found so far, this branch is pruned.
In this paper, we use ExtBBClq as one of our baselines. In fact, ExtBBClq essentially reduces the MBB problem to the maximal biclique enumeration (MBE) problem. We build several baselines using other state-of-the-art MBE algorithms with upper bound based prunings. Details are shown in the experimental studies.
ExtBBClq suffers from several shortcomings. For dense bipartite graphs, the upper bound based pruning is less effective because every vertex looks promising according to their tight upper bounds. For instance, given the bipartite graph in Figure 1(a), every vertex has a looser upper bound of no less than whereas the size of an exact MBB is . For sparse bipartite graphs, the applied total search order has limited effectiveness for finding a large result at an early stage of the search, which limits the pruning effectiveness. Besides, the total search order cannot tightly bound the search space, which results in high time complexity of ExtBBClq.
Bearing the above shortcomings in mind, we propose novel and efficient algorithms for dense and sparse bipartite graphs.
4 A Novel Algorithm for Dense Bipartite Graphs
Efficiently searching an MBB in dense bipartite graphs is very important. There are two cases: 1) the input bipartite graph of an application is dense itself; 2) the original sparse bipartite graph of an application may be pruned and the remaining subgraphs become dense. In both cases, a fast algorithm dedicated for dense bipartite graphs is the key for speeding up the search.
In this section, we propose a novel reduction, branch and bound algorithm, denoted by denseMBB, for those bipartite graphs that are sufficiently dense, where is at least for a given bipartite graph . As discussed, for real applications such as VLSI design, dealing with bipartite graphs with such high density is very common.
Idea of our approach. We find that when a bipartite graph is sufficiently dense, the MBB problem can be solved in polynomial time. As such, when branching at a vertex, we propose a branching strategy that aims to branch at a vertex which makes the remaining subgraphs denser and polynomially solvable as soon as possible. Given the fact that the input graph is dense, the search can approach polynomially solvable subgraphs quickly using the above branching strategy, which substantially increases the performance. Moreover, when an input graph is sufficiently dense, it is polynomial time solvable directly.
4.1 Basic Enumerations
In this section, we show the enumeration scheme that we use. We explain it here since it is different from the existing works and it is fundamental for the correctness proof and the time complexity analysis of our advanced approach.
Algorithm 1 shows the enumeration scheme. It works on three pairs of sets, denoted as , , and . is for storing the intermediate result of a balanced biclique. contains candidate vertices to further expand . stores the MBB found so far. Initially, the sets in and are empty while , for .
Algorithm 1 finds an MBB via a search space that is a binary tree (lines 1 and 1). It is efficient for enumerating balanced bicliques because of the following reasons. Firstly, it only considers vertices that can formulate bicliques with , which is ensured by set operations in line 1. Secondly, the bicliques enumerated by Algorithm 1 are near balanced, i.e., the difference between and is no more than . This is because the recursive calls switch the inputs, ensuring and are enlarged in turn. As such, Algorithm 1 avoids enumerating a large number of imbalanced bicliques while finding an MBB.
Algorithm 1 also applies intuitive prunings (line 1). Given a recursion with , and , this recursion can be terminated if the following bounding condition is satisfied: . The correctness of the bounding condition is obvious. It indicates that the remaining search space cannot hold any balanced biclique with size greater than .
Next we will propose novel techniques that reduce to and devise prunings that make near constant for dense bipartite graphs.
4.2 Optimizations for Dense Bipartite Graphs
In this section, we introduce the findings that help us improve the basic enumeration, leading to a novel reduction, branch and bound algorithm.
Reduction. We start the optimization with two simple but effective reduction rules which are applied for every recursion with , and if possible.
All connection reduction rule. Given a vertex (), if () connects to every vertex in (), move () from () to .
Low degree reduction rule. Given a vertex (), if () is less than (), remove () from ().
The correctness of the above two reductions is clear. The reductions are applied until no vertices can be removed.
Next we show the techniques that lead to the algorithm with better complexity. The intuition is that for any recursive call in Algorithm 1, if we can guarantee that the two branches created would reduce the size of the candidate sets by at least and respectively, i.e., the worst branching factor [cormen2009introduction] is , the number of leaves of the recursion tree can be bounded by . This can be achieved by: when a recursion will lead to worse branching factors (e.g. ), we do not continue the recursion but begin to solve the current sub-problem with a polynomial solution.
Polynomially solvable cases. We first introduce three definitions and then three important observations leading to polynomial time solvable cases.
Size constraint biclique problem. The size constraint biclique problem is defined as: given a bipartite graph (), and a pair of integers , determine if there is a biclique in such that and . We abbreviate size constraint biclique problem as biclique problem.
Maximal instances of biclique problem. We name an instance of biclique problem as maximal biclique problem for a bipartite graph if there exists no size biclique in the bipartite graph such that , satisfy one of the conditions: 1) and , 2) and , 3) and .
Bipartite complementary graph. Given a bipartite graph , its bipartite complementary graph is defined as , where is .
Given a bipartite graph , if , and , then the non-trivial parts of the bipartite complementary graph of would be a combination of even length paths, odd length paths and cycles.
Given a bipartite graph that is an even length path, an odd length path or a cycle with length of , the maximal instances of bicliques in the bipartite complementary graph of are determined, shown below:
odd length path (): , , , , .
even length path(): , , , if and , , , , and if .
cycle: , and , , , for and , for .
Given a bipartite graph that is an even length path, an odd length path or a circle with length of , any instance of bicliques in the bipartite complementary graph of is polynomially solvable.
Example. In Figure 2, we show some examples for the above observations. Edges with dashed lines are the real edges in the bipartite graphs whereas lines with grey colour denotes the edges that are in their bipartite complimentary graphs.
Odd path. For Figure 2(a), its complimentary bipartite graph contains an odd path with length of (grey lines), all possible maximal biclique instances for the vertices in this path induced subgraph of Figure 2(a) are , , and bicliques.
Even path. For Figure 2(b), its complimentary bipartite graph forms an even path with length of and , all possible maximal biclique instances are , , and bicliques for the vertices of the path induced subgraph of Figure 2(b).
We are ready to give the lemma below.
Given a subgraph , if , and , the MBB problem can be solved in polynomial time.
The intuitions of Algorithm 2 can be summarized below. Given satisfying the conditions in Lemma 3, the non-trivial part of shall consist of a combination of odd paths, even paths, and cycles (Observation 2). Therefore, all possible maximal instances of bicliques of can be built by checking the combinations of the trivial part of and different maximal instances of bicliques that exist in the bipartite complementary graphs of the odd paths, even paths, and cycles. After knowing all possible maximal instances of bicliques of , the size of the MBB can be easily derived and an MBB can be found easily. We shall show that all possible maximal instances of bicliques that contains can be built efficiently using the dynamic programming technique.
To embed the algorithm with the basic enumeration and reduction rules, Algorithm 2 works on a partial result and a candidate set pair when induced subgraph satisfies the conditions stated in Lemma 3, and thus deemed as polynomially solvable. Algorithm 2 first initializes a table that contains all the possible instances of bicliques that may be an MBB (lines 2 to 2), where value indicates that initially the maximal instance exists. Algorithm 2 builds actual maximal biclique instances by combining the current known biclique and a new biclique implied by a path/cycle subgraph via lines 2 to 2. The values of corresponding cells denote how many components that are absorbed to build the current biclique, each of which comes from a path or cycle in the complementary graph. Using these values significantly reduces the number of cells in the table to be evaluated in the next loop. After the loop, all possible maximal instances of biclique that can be derived from and are marked as none-zero values. The largest instance of MBB can be easily derived via line 2. Then a new MBB would be computed if it is greater than the best MBB found so far (lines 2 and 2).
Algorithm 2 correctly finds an MBB based on the discussion above. Algorithm 2 runs in clearly since in the worst case the loop (lines 2 to 2) accesses all the cells of . In fact, it is much faster. Due to limited space, obvious prunings applied in Algorithm 2 are not shown.
Branching techniques. According to Lemma 3, branching at a vertex leading to polynomially solvable cases would lead to fast search. Therefore, a simple branching strategy is always to branch at a vertex that misses greater than 2 neighbours.
Discussion. It is natural to ask whether we can use the well-studied missing neighbour reduction techniques [10.1145/3292500.3330986], typically for finding a maximum clique in dense general graphs, for speeding up MBB search for dense bipartite graphs. Unfortunately we could not transplant the techniques. For the clique problem, it has linear size self-reducible property, i.e., if there is a size -clique in , then there must be a size -clique for any and the total number of self-reducible problems is up to . However, for the biclique problem, if there is an biclique in , there would be instances of biclique problems that can lead to a size biclique. As such, missing neighbour reduction techniques may not be able to simplify MBB search.
4.3 The Algorithm
Now we are ready to present the complete reduction, branch and bound algorithm.
The algorithm. The major steps are shown in Algorithm 3. Algorithm 3 incorporates all the discussed theoretical findings to speed up MBB search: line is for optimizing branching, line is for reducing the subgraph as much as possible, lines to are for processing polynomially solvable cases whenever possible.
Time complexity. The time complexity of Algorithm 3 is . All polynomially solvable cases and the proposed branching strategy ensure that the worst branching factor is , since is able to invalidate at least non-neighbors plus the removal of from the candidate set, resulting a reduction of at least vertices from the candidate sets. Therefore, the total number of recursion is bounded by . For individual recursions, the time complexities are dominated by polynomially solvable cases, i.e., Algorithm 2. Therefore, the time complexity of Algorithm 3 is , i.e., .
We would like to highlight that for dense graph with the number of at the scale of at least, Algorithm 3 most likely runs in since it converges to polynomially solvable cases with near constant numbers of recursions. For example, the bipartite graph shown in Figure 1(a) can be solved in polynomial time directly since the bipartite graph directly satisfies the conditions in Lemma 3.
5 A Novel Algorithm for Large Sparse Bipartite Graphs
Solving the MBB problem for large sparse bipartite graphs is important for applications such as biological data analysis. Existing MBB algorithms reduce the MBB problem to the maximal biclique enumeration problem with various of prunings. As such, the time complexities of the existing algorithms cannot be better than that of the maximal biclique enumeration problem. As far as we know, the state-of-the-art maximal biclique enumeration algorithm has the time complexity of , where is the maximum degree. Given the fact that can be as high as , applying our proposed Algorithm 3 directly on large sparse bipartite graphs would lead to a better algorithm from theoretical perspective. In fact, we can do much better.
In this section, we propose a novel MBB algorithm for large sparse bipartite graphs.
5.1 Overview of the Algorithm
Intuitions of our algorithm can be summarized below.
Firstly, we separate heuristics from the exhaustive search, which brings advantages. We can apply advanced heuristics that have a higher chance to find a global MBB before exhaustive search. This increases the effectiveness of the upper bound based pruning phenomenally.
Secondly, to efficiently perform the exhaustive search, we propose a novel technique that transforms the bipartite graph that cannot be pruned into small but dense subgraphs. A tighter upper bound for each vertex can be derived within their local dense subgraphs, which further reduces the search space. Impressively, experimental results demonstrate that for lots of real sparse datasets ( out of ), an MBB can be derived without exhaustive search using the above techniques.
Last but not the least, we apply the proposed Algorithm 3 to small but dense subgraphs that cannot be pruned. Due to the fact that remaining subgraphs exhibit high density and small size, this step takes near polynomial time in practice.
To effectively apply the aforementioned ideas, we propose a search framework consisting of three major steps, shown in Algorithm 4. The first step is for finding a large-size result heuristically and reducing the graph as much as possible. The second step is for generating locally dense subgraphs. This step would further refine the maximality of the found MBB and prune the bipartite graph if possible. The third step is for verifying the maximality of the found results.
Next, we expand each step in great detail.
5.2 Heuristic and Reduction
In this section, we propose a fast heuristic MBB search algorithm, denoted by hMBB for effectively pruning sparse bipartite graphs. Different from existing heuristic MBB search algorithms aiming for discovering a large MBB within reasonable time, as a subroutine of exact MBB search, we have the expectations for hMBB below. Firstly, hMBB should be extremely fast, i.e., near linear time w.r.t. the size of a bipartite graph. Secondly, hMBB should reduce the graph size as much as possible.
Now, we introduce the hMBB algorithm.
Heuristics and reduction based approach. hMBB follows heuristics and reduction pattern. Let denote the maximum balanced biclique found so far, e.g., by a greedy algorithm, we apply reductions used in [ZHOU201986, wang2018new] below.
Given , not in core subgraph, cannot be a part of balanced biclique having size greater than .
The hMBB algorithm. hMBB is shown in Algorithm 5. In the first step, Algorithm 5 first endeavours to find a large-size balanced biclique using the maximum degree based greedy rule and then applies a reduction based on Lemma 4. Due to the simplicity of the greedy algorithm, we omit its details. After that, maximum core number based greedy rule is used to find a large-size balanced biclique and Lemma 4 based reduction is applied again if a larger balanced biclique is found.
Early termination. We propose an early termination condition to avoid unnecessary further vertex deletions once an MBB is found.
Let denote the maximum balanced biclique found up to the time in , if equals to twice of the core number of , then we can terminate the algorithm.
Time complexity. The time complexity of hMBB is , which is dominated by the computation of core decomposition. We may apply hMBB for top- maximum degree (core number) vertices and see if we can get a larger-size balanced biclique and further reduce the bipartite graph accordingly.
Example. Assume the graph shown in Figure 1(b) is the input for Algorithm 5. Using degree based heuristic, it will find a size- balanced biclique. Using the core-based heuristic, where the core for each vertex is shown in Table 2, it will find a size- balanced biclique . Using this result, Algorithm 5 detects that the condition in Lemma 5 is satisfied. Therefore, is the optimum result.
5.3 Bridging to Maximality
In this section, we propose techniques for preparing maximality verification. We propose an approach that effectively transforms the residual subgraphs output by step into small but dense subgraphs without loss of global optimum results. After the transformation, tighter upper bound for each vertex could derived, which would further prune subgraphs that are fruitless.
We first show theoretical findings that help us explain the above techniques.
5.3.1 Measuring Bisparsity
For general graphs, the sparsity measurement, known as degeneracy, is derived based on -hop neighbours of every vertex. Degeneracy lays the foundation for designing efficient algorithms for the maximum clique problem for general graphs [10.1145/3292500.3330986]. A general graph with degeneracy of would bound the enumeration depth to for the clique problem, which is significantly less than when the graph is sparse.
Different from general graphs, biclique problems for a bipartite graph have to consider both and sides. To effectively capture the bipartite sparsity of a bipartite graph, we have to consider both sides. Therefore, we propose bisparity, which is measured by bidegeneracy (bipartite degeneracy) derived from both -hop neighbours and -hop neighbours of every vertex in a bipartite graph. Using bidegeneracy, we would show its effectiveness for reducing the search space for solving the MBB problem in large sparse bipartite graphs later.
We first formally define 2-hop neighbours.
2-hop neighbours. Given a vertex in , we use to denote the set of vertices who have the lengths of shortest path to at exactly.
For instance, for vertex in Figure 1(b), the -hop neighbours of are .
. We define to be the neighbours and -hop neighbours of in , i.e., .
For instance, for vertex in Figure 1(b), includes .
Below, we propose novel definitions, bicore and bidegeneracy, which are fundamental for devising our proposed algorithm for large sparse bipartite graphs.
Bicore number. Given a bipartite graph , the bicore number of a vertex , denoted by bc, is the largest possible integer such that there exists a subgraph containing whose is .
Bidegeneracy. The maximum bicore number of , is defined as the bidegeneracy of , denoted as .
Using bidegeneracy, a bidegeneracy order can be defined accordingly.
Bidegeneracy order. A permutation of , BDorder is a bidegeneracy order if every vertex has the smallest in the subgraph of induced by .
5.3.2 From Sparse to Dense
We are ready to show an effective method for transforming residual subgraphs, output by step 1, into small dense subgraphs.
We first introduce two observations below.
Biclique search scope for a vertex. Given a vertex of a bipartite graph , all the bicliques where is involved are restricted within and induced subgraphs.
For instance, for vertices and in Figure 1(b), the and induced subgraph is shown in Figure 3(a) and the and induced subgraph is shown in Figure 3(b). Please ignore the colour differences in this example. Clearly, all maximal bicliques involving and are contained in Figures 3(a) and (b) respectively.
Total search order. In general, given a bipartite graph, an exhaustive search of visiting all maximal balanced bicliques would follow a certain total search order of vertices . Following the total search order, when processing , the exhaustive search only considers bicliques that must contain in induced subgraphs of to avoid duplicate combinations.
Based on the above two observations, we can transform a bipartite graph into at most number of subgraphs, where each subgraph is defined below.
Vertex centred subgraph. Given a total search order for a graph and , centred subgraph is defined as and induced subgraph.
A search order for tightening search space. We want to find an order that can tightly bound the total size of the vertex centred subgraphs. We show our findings below.
Using non-increasing degree order, the total size of vertex centred subgraphs for is , where is the maximum degree of .
Using degeneracy order, the total size of vertex centred subgraphs for is .
Using bidegeneracy order, the total size of vertex centred subgraphs for is .
In real graphs, is significantly smaller than . Therefore, using bidegeneracy order would have much tighter bound. We will show how dense of a vertex centred subgraph is in the experimental studies.
5.3.3 The Algorithm
Based on the discussed theoretical findings, we propose Algorithm 6 for step in Algorithm 4. It first computes bidegeneracy for the pruned graph (). Then it generates vertex centred subgraphs using bidegeneracy order.
To prune the vertex centred subgraphs as much as possible, for each subgraph , Algorithm 6 applies prunings according to the size of , degeneracy of , and the maximum balanced biclique found up to the time. The local upper bound for each subgraph is significantly improved. Therefore, the pruning effectiveness is phenomenal. For the subgraph that cannot be pruned, Algorithm 6 applies maximum degeneracy based greedy algorithm, which attempts to find a larger MBB for maximizing pruning effects.
Algorithm 6 returns the maximum balanced biclique found so far and vertex centred subgraphs that cannot be pruned.
Given a bipartite graph , there exists an algorithm that can perform bicore decomposition for with time complexity of .
It is non-trivial to design an efficient peeling algorithm for bicore decomposition. This is because when removing , for every , may reduce more than . If the peeling order is not chosen carefully, we may need extra computation to check how many neighbours that each loses after removing . To avoid such pessimistic situation, we show our theoretical finding below.
When peeling, if every time the removed in satisfies two conditions: 1) has the minimum and 2) has the minimum among all vertices satisfying condition 1), then for every vertex in , its shall reduce by no more than .
The above lemma can be proved easily via contradiction.
Based on Lemma 10, an bicore decomposition algorithm is shown in Algorithm 7. To achieve time complexity, we adapt the bucket sort based approach [batagelj2003m]. We need a bucket for effectively maintaining an order according to the union of neighbours and 2-hop neighbours for speeding up the checking of condition 1) in Lemma 10. We also need a set of buckets for effectively maintaining a set of orders according to neighbours of vertices whose union of neighbours and -hop neighbours have the same size for speeding up the checking of condition 2) in Lemma 10. As such, lines 3 and 4 can be done within and line 10 can be done in constant time. The dominating parts are line 2 and lines 5 to 10 and each of them takes .
5.4 Maximality Verification
In this section, we propose how to efficiently verify the maximality of the result found up to the time on the set of vertex centred subgraphs that cannot be pruned.
Maximality verification algorithm. Algorithm 8 shows how to verify the maximality of . It checks all vertex centred graphs that cannot be pruned by techniques that have been discussed. For one vertex centred graph , Algorithm 8 would first further reduce according to Lemma 5 if a larger MBB is found (line 8). After that, Algorithm 8 calls Algorithm 3 to check whether the remaining contains a balanced biclique larger than (lines 8 to 8). If there is a larger one, will be updated. After checking all vertex centred graphs, Algorithm 8 returns the optimum result.
In this section, we show the correctness of Algorithm 4 embedding all the proposed techniques and analyze its time complexity.
Correctness. The correctness of Algorithm 4 can be derived as follows. Firstly, all the prunings are correct. Secondly, the vertex centred subgraphs are generated based on a total search order, therefore, the subgraphs cover all possible bicliques that are promising. Thirdly, all the promising vertex centred subgraphs are applied with exhaustive search (Algorithm 3) that is proven to be correct. Therefore, Algorithm 4 is correct.
Time complexity. Algorithm 4 embedding all the proposed techniques finds an MBB in . The breaking down analysis is given below. For step (Algorithm 5), the dominating computation is core decomposition that has time complexity of . For step (Algorithms 6 and 7), the dominating computation is bicore decomposition that can be bounded by . For step (Algorithm 8), the time complexity is , since there are up to number of vertex centred subgraphs that are evaluated by Algorithm 8. As such Algorithm 4 is dominated by step , which is . In fact, Algorithm 4 runs much faster.
6 Experimental studies
We conduct extensive experiments to verify the effectiveness and efficiency of the proposed techniques and algorithms.
Implemented algorithms. We first introduce implemented algorithms evaluated throughout the experimental studies.
Algorithms for dense bipartite graphs. We implement Algorithm 3, denoted as denseMBB. We also implement the state-of-the-art MBB algorithm extBBCL [ZHOU2018834] as a baseline for comparison.
Algorithms for sparse bipartite graphs. We implement our proposed Algorithm 4 including all the proposed techniques, denoted as hbvMBB. Besides extBBCL, we use the combinations of existing heuristic MBB algorithms, MBE algorithms and our proposed framework for sparse bipartite graphs to build a number of non-trivial baselines. Before showing these baselines, we discuss the implemented state-of-the-art heuristic MBB and MBE algorithms first.
Existing heuristic MBB algorithms. We consider the state-of-the-art heuristic MBB algorithms POLS [wang2018new] and SBMNAS [LI2020104922] for designing non-trivial baselines.The parameter settings are the same as the original papers.
Existing MBE algorithms. We adapt existing MBE algorithms by removing maximality and duplication checking. Instead, our proposed upper bound and the MBB found up to the time will be used to terminate unpromising branches to avoid the costly computations caused by maximality and duplication checking. The implemented MBE algorithms include iMBEA [zhang2014finding] and FMBE .
Adapted non-trivial baselines. We use POLS or SBMNAS to replace the heuristic algorithm used in step 1 of Algorithm 4 and use the adapted iMBEA or FMBE to replace our proposed step 2 and step 3 of Algorithm 4. As such, algorithms adp1 to adp4 are derived as our baselines shown in Table 3. Please note that the heuristic algorithms that we used are for pruning purpose only, which are discussed in Section 5.2.
Variants of our algorithms. We also implement different variants of our proposed Algorithm 4. They are for breaking down evaluations of the proposed techniques, denoted as bd1 to bd5. Their detailed configurations are shown in Table 3.
Other algorithms. We also implement the degeneracy algorithm denoted as degOrder to compare with our proposed bidegeneracy algorithm.
Measures. We measure the running time of the algorithms. The reported running time is the total CPU time (in seconds), excluding the I/O cost of loading graph and indices from disk to main memory, and a timeout of hours is set, denoted as ‘-’. All algorithms are implemented in C++. All the experiments are conducted on a PC with CPU of AMD 3900x (12 cores, 24 threads), memory of 64GB DDR4 3600HZ, and Windows 10 (build 1803). All the experiments are conducted no less than 100 times if the running time is less than 1 hour (10 times otherwise) and the average results are reported.
6.1 Evaluations on Dense Graphs
Datasets. We generate dense bipartite graphs for simulating real application scenarios by using random bipartite graph generation algorithm similar to [tahoori2006application]. The range of the edge density () in our evaluation is from to .
For each edge density and a given size, instances of bipartite graphs are generated and the average running time is reported for each density. The largest synthetic bipartite graph has vertices in each side. Please note that the largest dense bipartite graph used to evaluate the exact MBB algorithm in [ZHOU2018834] contains vertices only.
Table 4 shows the running time. We only compare with extBBCL since it is the only exact algorithm that can finish within hours for some of the tested datasets. The results clearly demonstrate that denseMBB is able to efficiently handle dense bipartite graphs. The results also show that denseMBB runs near quadratic time as data becoming dense. Furthermore, the scalability of denseMBB is also much better than that of extBBCL. denseMBB can find an MBB within minutes for bipartite graphs that have vertices in each side.
6.2 Evaluations on Sparse Graphs
Datasets. We use real datasets from Koblenz Network Collection (KONECT). 30 instances of the datasets are used to evaluate the algorithms discussed above. These 30 instances were also used in [ZHOU2018834].
We demonstrate the running time of adp1 to adp4, extBBCl and hbvMBB in Table 5 from 6th to 11th column respectively. Noticeably, hbvMBB outperforms all the other algorithms for all datasets. For large datasets, such as actor-movie, hbvMBB runs over 200 times faster than extBBCl. In average, hbvMBB runs several orders faster than all the other algorithms consistently. For most of the datasets, the running time of adp3 is the runner-up. This justifies the power of our proposed search framework. Please notice that adp3 uses the best reported heuristic algorithm for MBB and the best reported MBE algorithm for maximal biclique. We shall also highlight that our proposed algorithm can finish within minutes for all the datasets, whereas extBBCl cannot finish within hours for of the datasets. Although adp3 is the runner-up for most of the datasets, for datasets such as discogs-affilliation and pics-ut, it still needs up to 16 minutes to finish. This further demonstrates the superiority of our proposed algorithm.
6.3 Breaking down Evaluations
In this section, we show the performance of different combinations of our proposed techniques.
Power of heuristics, reduction and early termination. In Table 5 column 11, we show which step hbvMBB terminates. S1, S2 and S3 mean steps to respectively. For out of the datasets, our proposed algorithm, hbvMBB, can terminate at S2. This is because of two reasons. Firstly, our proposed heuristics can result in globally maximum balanced biclique. Secondly, in step , the original graph has been split into vertex centred subgraphs and upper bounds for these subgraphs are significantly tighter. As such, our proposed early termination conditions have high chance to be satisfied. This justifies the importance of separating the heuristic from exhaustive search, i.e., it greatly speeds up the search practically and allows an MBB to be found extremely fast in real datasets. Interestingly, for datasets, our algorithm terminates at step , which means our algorithm can solve the MBB problem in near linear time for these datasets.
From the above cases we can see that, some datasets are easy to process, which makes them less effective to evaluate our proposed techniques comprehensively. Therefore, we focus on datasets that hbvMBB cannot finish within seconds later.
Effectiveness of using different search orders. We demonstrate how different search orders, i.e., degree based order, degeneracy order and bidegeneracy order, affect the search performance by reporting the running time of variations of hbvMBB using the three orders. The results for using degree based order and degeneracy order are shown in Table 6 in columns and accordingly. As we can see, the running time of bd4 and bd5 is slower than that of hbvMBB (using bidegeneracy order) up to 6 times. Two major reasons cause such dramatic differences. Firstly, bidegeneracy can make the size of each subgraph that needs to perform exhaustive search smaller than the other two orders. Secondly, vertex centred subgraph induced by bidegeneracy order can lead to much smaller upper bound compared with the other two orders, i.e., the upper bounds are tighter, which results in better pruning effectiveness. In addition, the result that bd5 outperforms bd4 confirms that degeneracy order is better than degree order.
Overhead v.s. benefit for core and bicore. We report the overhead of computing core (degOrder) and bicore (bdegOrder) for each dataset in Table 6 and the results are shown in columns and respectively. As we can see, the running time of degOrder is trivial and that of bdegOrder is a bit slower for all the datasets. Note that, during the search, degOrder and bdegOrder are performed on much smaller pieces of data. The running time of bd2 (without using any core or bicore based optimizations), is shown in column 6 of Table 6. Compared with bd2, hbvMBB is several times faster than bd2, which justifies that using core and bicore based optimizations can bring dramatic benefits.
Overhead v.s. benefit for heuristics. We report the running time of heuristic algorithm (hMBB) and the running time of hbvMBB without using hMBB (bd1) on different datasets in columns and of Table 6. As expected, the running time of hMBB is close to that of degOrder since it is dominated by degOrder. bd1 takes considerably more time to find an MBB compared to hbvMBB. This is because hMBB can not only find a large balanced biclique, but also use the found balanced biclique to prune the graph as much as possible. From the above analysis, the benefit of using hMBB is significant.
Effectiveness of heuristics. We report the size gap between the maximum balanced biclique found by our proposed heuristic algorithms and the optimum maximum balanced biclique. In fact, we have two heuristic parts, i.e., hMBB and the heuristic used in Algorithm 6, denoted as heuGlobal and heuLocal respectively. The results are demonstrated in Figure 4. D1 to D12 denote the datasets in Table 6 in top-down order. As shown, with heuLocal, out of datasets can find the global maximum balanced biclique. This demonstrates the heuristic in Step can significantly improve the quality of the candidate maximum balanced biclique, which in turn reduces the cost for Step .
Evaluation on search depth. We report the average search depth for hbvMBB using different search orders discussed in Lemmas 6, 7, and 8 denoted as maxDeg, degeneracy and bidegeneracy. The results are shown in Figure 5. We use of each of the datasets as a reference and report the ratio of average search depth over for each order. Overall speaking, the average depth of bidegeneracy is significantly less than those of the other two. This justifies the size bounding effectiveness of our proposed vertex centred graph. Noticeably, for all datasets, the average search depth of bidegeneracy over is significantly less than , i.e., only 0.12 for D3 and D11. This justifies the reduction and branching techniques that we propose, and explains why hbvMBB is significantly faster than the other algorithms.
Evaluation on density of vertex centered subgraphs. We report the average density of vertex centered subgraphs generated by different orders denoted as maxDeg, degeneracy and bidegeneracy for each dataset. The results are shown in Figure 6. Firstly, bidegeneracy has much higher effectiveness to generate high density vertex centered subgraphs. For all datasets, the average density of subgraphs generated using bidegeneracy is an order higher than maxDeg and degeneracy. For dataset such as D11, the average density of vertex centered subgraph is quite high, i.e., close to a biclique. This indicates that finding an MBB in such bipartite graph is hard if using the existing techniques that do not optimize for dense subgraphs.