Phase Transition in Matched Formulas and a Heuristic for Biclique Satisfiability

08/06/2018 ∙ by Miloš Chromý, et al. ∙ Charles University in Prague 0

A matched formula is a CNF formula whose incidence graph admits a matching which matches a distinct variable to every clause. We study phase transition in a context of matched formulas and their generalization of biclique satisfiable formulas. We have performed experiments to find a phase transition of property "being matched" with respect to the ratio m/n where m is the number of clauses and n is the number of variables of the input formula φ. We compare the results of experiments to a theoretical lower bound which was shown by Franco and Gelder (2003). Any matched formula is satisfiable, moreover, it remains satisfiable even if we change polarities of any literal occurrences. Szeider (2005) generalized matched formulas into two classes having the same property -- var-satisfiable and biclique satisfiable formulas. A formula is biclique satisfiable if its incidence graph admits covering by pairwise disjoint bounded bicliques. Recognizing if a formula is biclique satisfiable is NP-complete. In this paper we describe a heuristic algorithm for recognizing whether a formula is biclique satisfiable and we evaluate it by experiments on random formulas. We also describe an encoding of the problem of checking whether a formula is biclique satisfiable into SAT and we use it to evaluate the performance of our heuristic

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we are interested in the problem of satisfiability (SAT) which is central to many areas of theoretical computer science. In this problem we are given a formula in propositional logic and we ask if this formula is satisfiable, i.e. if there is an assignment of values to variables which satisfies . This is one of the best known NP-complete problems [7]. In this paper we study special classes of formulas whose definition is based on the notion of incidence graph.

Given a formula in conjunctive normal form (CNF) we consider its incidence graph defined as follows. is a bipartite graph with one part consisting of the variables of and the other part consisting of the clauses of . An edge for a variable and a clause is in if or appear in . It was observed by the authors of [2] and [17] that if admits a matching of size (where is the number of clauses in ), then is satisfiable. Later the formulas satisfying this condition were called matched formulas in [10]. Since a matching of maximum size in a bipartite graph can be found in polynomial time (see e.g. [13, 14]), one can check efficiently whether a given formula is matched.

It is clear that if is a formula on variables and clauses then can be matched only if . The authors of [10]

asked an interesting question: What is the probability that a formula

is matched depending on the ratio ? We can moreover ask if the property “being matched” exhibits a phase transition.

A phase transition was studied in context of satisfiability [5, 11, 15, 8, 6]. The so-called satisfiability threshold for a given is a value satisfying the following property: A random formula in -CNF on variables and clauses is almost surely satisfiable if and it is almost surely unsatisfiable if . For instance the value is approximately  [8, 6].

In the same sense we can study threshold for property “being matched”. It was shown in [10] that a -CNF on variables and clauses is almost surely matched if . This is merely a theoretical lower bound, and in this paper we perform experimental check of this value. It turns out that the experimentally observed threshold is much higher than the theoretical lower bound. Moreover we observe that the property “being matched” has a sharp threshold or phase transition as a function of ratio .

Matched formulas have an interesting property: If a formula is matched then we pick any occurrence of any literal and switch its polarity (i.e. change a positive literal into a negative literal or vice versa). The formula produced by this operation will be matched and thus satisfiable as well. This is because the definition of incidence graph completely ignores the polarities of variables. The formulas with this property were called var-satisfiable in [16] and they form a much bigger class than matched formulas. Unfortunately, it was shown in [16] that the problem of checking whether a given formula is var-satisfiable is complete for the second level of polynomial hierarchy.

Szeider in [16] defined a subclass of var-satisfiable formulas called biclique satisfiable formulas which extends matched formulas. It was shown in [16] that checking if is biclique satisfiable is an NP-complete problem. In this paper we describe a heuristic algorithm to test whether a formula is biclique satisfiable. Our heuristic algorithm is based on an heuristic for covering a bipartite graph with bicliques described in [12]. We test our heuristic algorithm experimentally on random formulas. Our heuristic algorithm is incomplete, in particular, whenever it finds that a formula is biclique satisfiable, then it is so, but it may happen that a formula is biclique satisfiable even though our algorithm is unable to detect it. In order to check the quality of our heuristic, we propose a SAT based approach to checking biclique satisfiability of a formula. We compare both approaches on random formulas.

In Section 2 we recall some basic definitions and related results used in the rest of the paper. In Section 3 we give the results of experiments on matched formulas. In Section 4 we describe our heuristic algorithm for determining if a formula is biclique satisfiable and we give the results of its experimental evaluation. In Section 5 we describe a SAT based approach to checking biclique satisfiability and compare it experimentally with the heuristic approach. We close the paper with concluding remarks in Section 6 and give directions of further research in Section 7.

2 Definitions and Related Results

In this section we shall introduce necessary notions and results used in the paper.

2.1 Graph Theory

We use the standard graph terminology (see e.g. [4]). A bipartite graph is a triple with vertices split into two parts and and the set of edges satisfying that . Given a bipartite graph we shall also use the notation and to denote the vertices in the first and in the second part respectively. For two natural numbers we denote by the complete bipartite graph (or a biclique) that is the graph with , and .

Given a bipartite graph the degree of a vertex is the number of incident edges. A subset of edges is called a matching of if every vertex in is incident to at most one edge in . A vertex is matched by matching if is incident to some edge from . is a maximum matching if for every other matching of we have that . There is a polynomial algorithm for finding a maximum matching of a bipartite graph which runs in  [13, 14].

2.2 Boolean Formulas

A literal is a variable or its negation . A clause is a finite disjunction of distinct literals , where is the width of clause . A formula in conjunctive normal form (CNF) is a finite conjunction of clauses . Formula is in -CNF if all clauses in have width at most . We shall also often write (-)CNF instead of in (-)CNF.

Let us now recall the definition of probability space from [10].

Definition 1 (Franco and Van Gelder [10])

Let be a set of Boolean variables and let be the set of literals over variables in . Let be the set of all clauses with exactly variable-distinct literals from . A random formula in probability space is a sequence of clauses from selected uniformly, independently, and with replacement.

2.3 Matched Formulas

Let be a CNF formula on variables . We associate a bipartite graph with (also called the incidence graph of ), where the vertices correspond to the variables in and to the clauses in . A variable is connected to a clause (i.e. ) if contains or . A CNF formula is matched if has a matching of size , i.e. if there is a matching which pairs each clause with a unique variable. It was observed in [2, 17] that a matched CNF is always satisfiable since each clause can be satisfied by the variable matched to the given clause. A variable which is matched to some clause in a matching is called matched in .

We can see that checking if a formula is matched amounts to checking if the size of a maximum matching of is . This can be done in time where denotes the number of clauses in , denotes the number of variables in , and denotes the total length of formula that is the sum of the widths of the clauses in .

The following result on density of matched formulas in the probability space was shown in [10].

Theorem 2.1 (Franco and Van Gelder [10])

Under , the probability that a random formula is matched tends to if as .

One of the goals of this paper is to check experimentally how good estimate of the real threshold is the theoretical value

.

2.4 Biclique Satisfiable Formulas

One of the biggest limitations of matched formulas is that if is a matched formula on variables and clauses, then . To overcome this limitation while keeping many nice properties of matched formulas, Stefan Szeider introduced biclique satisfiable formulas in [16].

We say that a biclique is bounded if . Let be a CNF on variables and clauses and let us assume that where . Then is satisfiable [16]. This is because we have clauses each of which contains all variables. Each of these clauses determines one unsatisfying assignment of , but there is assignments in total. Thus one of these must be satisfying.

Based on this observation we can define biclique satisfiable formulas [16]. We say, that a bipartite graph has a bounded biclique cover if there exists a set of bounded bicliques satisfying the following conditions.

  • every is a subgraph of G,

  • for any pair of indices we have that , and

  • for every there is a biclique such that .

If every biclique in the cover satisfies that , then we say the graph has a bounded -biclique cover. A formula is ()-biclique satisfiable if its incidence graph has a bounded (-)biclique cover.

It can be easily shown that any biclique satisfiable formula is indeed satisfiable, however, it is an NP-complete problem to decide if a formula is biclique satisfiable even if we only restrict to -biclique satisfiable formulas. For proofs of both results see [16]. On the other hand it is immediate that -satisfiable formulas are matched formulas, because a single edge is a bounded biclique.

2.5 Generating experimental data

Whether a formula in CNF is matched or not depends only on its incidence graph . Instead of random formulas from probabilistic space we thus consider random bipartite graphs from probabilistic space .

Definition 2

Probability space is defined as follows. A random bipartite graph is a bipartite graph with parts where , . Each vertex has randomly uniformly selected neighbours from .

In our experiments we generated bipartite graphs . Since we consider choosing clauses in formula with replacement, we can have several copies of the same clause in . It follows that given a bipartite graph , we have exactly formulas which have — each vertex can be replaced with different clauses with setting polarities to variables adjacent to in . In particular, the probability that a random formula is matched is the same as the probability that a random bipartite graph admits a matching of size . The same holds for the biclique satisfiability.

3 Phase Transition on Matched Formulas

In this section we shall describe the results of experiments we have performed on matched formulas. In particular we were interested in phase transition of -CNF formulas with respect to the property “being matched” depending on the ratio of the number of clauses to the number of variables. We will also compare the results with the theoretical bound proved in [10] (see Theorem 2.1).

Note that the graphs in correspond to incidence graphs of -CNFs on variables and clauses. In particular, the probability that a random formula is matched is the same as the probability that a random bipartite graph admits a matching of size . In the experiments we were working with random bipartite graphs and we identified them with random formulas. The difference between a random formula and a random bipartite graph is in polarities of variables which have no influence on whether the formula is matched or not.

Figure 1: Results of experiments on random graph with and . The horizontal axis represents the ratio . The vertical axis represents the percentage of graphs which admit matching of size . For each and we have generated a random graphs from .
3-CNF 4-CNF 5-CNF 6-CNF 7-CNF 8-CNF 9-CNF 10-CNF
low high low high low high low high low low low low
100
200
500
1000
2000
4000
Table 1: Phase transition intervals of matched formulas as two values high and low. We provide only low value for , because the high value was in this case for all configurations.

In our experiments we considered values of number of variables , 200, 500, 1000, 2000, 4000 and . For each such pair , we have generated random graphs for ratio . Figure 1 shows the graph with the results of experiments for value . The graph contains a different line for each value of which shows the percentage of graphs which admit matching of size among the generated random graphs depending on ratio . The complete results of the experiments are shown in Table 1. For each value of we distinguish two values high and low where only of the graphs generated in with admit matching of size , and on the other hand of the graphs generated in with admit matching of size .

We can see that for higher values of the interval gets narrower and we can thus claim that the property “being matched” indeed exhibits a phase transition phenomenon. Moreover we can say that the average of values low and high limits to the threshold of this phase transition. We can see that the threshold ratio for is around which is much higher than the theoretical bound from [10] (see Theorem 2.1). In all configurations with the high value was while the low value was close to as well. Thus in the experiments we made with even in the case almost all of the randomly generated graphs admitted matching of size .

4 Bounded Biclique Cover Heuristic

The class of biclique satisfiable formulas form a natural extension to the class of matched formulas. This class was introduced by Szeider [16], where the author showed that it is NP-complete to decide whether a given formula is biclique satisfiable. Recall that this decision is equivalent to checking if the incidence graph has a bounded biclique cover. In this section we shall describe a heuristic algorithm for finding a bounded biclique cover. The algorithm we introduce is incomplete, which means that it does not necessarily find a bounded biclique cover if it exists, on the other hand the algorithm runs in polynomial time.

4.1 Description of Heuristic Algorithm

Our heuristic approach is described in Algorithm 1. It is based on a heuristic algorithm for finding a smallest biclique cover of a bipartite graph described in [12]. The algorithm expects three parameters. The first two parameters are a bipartite graph and an integer which restricts the size of the first part of bounded bicliques used in the cover, in other words only bicliques satisfying that are included in the cover which is output by the algorithm. The last parameter used in the algorithm is the strategy for selecting a seed.

Let be a bipartite graph . A seed in is a biclique which is a nonempty subgraph of with and . We say that is a maximal seed if there is no seed so that and .

After initializing an empty cover , the algorithm starts with a pruning step (unitGPropagation) which is used also in the main cycle. In this step a simple reduction rule is repeatedly applied to the graph : If a vertex is present in a single edge , then this edge has to be added into the cover as a biclique in order to cover . In this case vertices and with all edges incident to are removed from graph . If a vertex which is not incident to any edge in is encountered during this process, the heuristic algorithm fails and returns an empty cover.

The algorithm continues with generating a list of all maximal seeds induced by all pairs . The input graph is modified during the algorithm by removing edges and vertices. In the following description always denotes the current version of the graph.

The main cycle of the algorithm repeats while there are some seeds available and does not admit a matching of size . This is checked by calling function testMatched which also adds the matching to if it is found.

The body of the main cycle starts with selecting a seed by function chooseSeed. This choice is based on a given strategy. We consider three strategies for selecting a seed: Strategy chooses a seed with the smallest second part. Strategy chooses a seed with the largest second part. And strategy chooses a random seed. Seed is then expanded by repeatedly calling expandSeed. This function selects a vertex which maximizes the size of the second part of the biclique induced in with left part being (the second part is induced to be all the vertices incident to all vertices in ). The expansion process continues while the size of the first part satisfies the restriction imposed by parameter and while is not a bounded biclique (that is while ).

If the expansion process ends due to the restriction on the size given by , is not necessarily a bounded biclique. In this case we use a function restrictSeed which simply removes randomly choosen vertices from so that becomes a bounded biclique.

Once a bounded biclique is found, it is removed from the graph and it is added to the cover . This is realized by a function removeBiclique which simply sets , , and . Then we call unitGPropagation to prune the graph. After that function removeInvalidSeeds removes from all seeds with . For remaining seeds the function sets .

After the cycle finishes the current cover is returned.

Data: Bipartite graph , — maximal size of for a biclique which we put into the cover and a seeds selection strategy st.
Result: biclique cover of graph if a heuristic found one, otherwise
if unitGPropagation fails then return   //   //
while  and not testMatched do //
         //
       while  do
               //
            
       end while
      if  then    //   //
       if unitGPropagation fails then return   //   //
      
end while
return
Algorithm 1 An heuristic for checking if there is a bounded biclique cover of a bipartite graph . The complexity of each step is noted in comments where we consider , , and .

Let us estimate the running time of our heuristic algorithm 1. Let us denote , , and (also corresponds to the length of a formula). Generating all seeds requires time . The main cycle will repeat at most times, because we cannot have more bounded bicliques than the number of vertices in . In case that the second part is bigger then the first one, graph cannot be an incidence graph of a matched formula, so checking if a graph admits a matching of size has constant time complexity if . In case that function testMatched will run in  [13, 14]. All other steps within the main cycle (including the pruning step) can be performed in time and thus the complexity of our heuristic is .

If a nonempty set of bicliques is returned by the algorithm, then it is a bounded biclique cover of . It should be noted that the opposite implication does not necessarily hold, if the seeds are chosen badly then the algorithm may fail even if there is some bounded biclique cover in . In the next section we aim to evaluate our heuristic algorithm experimentally.

4.2 Experimental Evaluation of Heuristic Algorithm

In this section we shall describe the experiments performed with our heuristic Algorithm 1 described in Section 4.1.

Algorithm 1 works with bipartite graphs. We have tested proposed heuristic on bipartite graphs from the probabilistic space with and with the degrees of vertices in the second part being . This corresponds to formulas in -CNF for these values. We have considered different sizes of the second part given by ratios . The upper bound was chosen because we were mainly interested in bounded 2-biclique cover. For graphs with there is no bounded 2-biclique . For comparison, we have also performed experiments with unrestricted sizes of bounded bicliques and we have tried the three strategies and for selecting a seed. In the experiments we checked whether Algorithm 1 found a bounded biclique cover of a given random graph generated according to the above mentioned parameters.

Due to time complexity of Algorithm 1 we have only generated a hundred random graphs in for each configuration (given by a strategy, bound on the size of of each biclique, and ratio ).

1 1.1 1.2 1.3 1.4 1.5
low high low high low high low high low high low high
Table 2: Results of experiments with our heuristic algorithm on graphs with size of second part. Each pair of columns low and high represents a phase transition interval. Each row corresponds to one strategy. A more detailed explanation can be found in the main text.

Table 2 summarizes the results of our experiments. Each row corresponds to a combination of a strategy for selecting a seed and a bound imposed on the size of biclique (superscript for bounded 2-biclique cover, for general bounded biclique cover). Each column corresponds to a ratio , we have included only ratios , , , , , and in the table. For each configuration we have two bounds low and high on degree of vertices in the second part of graph . Our heuristic algorithm succeeded only on of graphs with degree and on the other hand it succeeded on of graphs with degree .

We can see that for a bounded 2-biclique cover , the strategies and are never worse than and that they even get better for higher ratios. This makes the best strategy for seed size restriction — it is easiest to implement and randomness means that repeated calls of our heuristic algorithm may eventually lead to finding a biclique cover. As we can expect, heuristic performs quite well on lower values of ratio and it gets worse on higher values of this ratio. For general bounded biclique cover the heuristics and behave very similarly while is better in most cases.

Figure 2: Results of experiments with our heuristic algorithm with strategy and . The horizontal axis represents the ratio . The vertical axis represents the degree of vertices . The more white pixel is, the more random graphs were covered by a bounded 2-biclique cover by the algorithm.
Figure 3: Results of experiments with our heuristic algorithm with strategy and . The horizontal axis represents the ratio . The vertical axis represents the degree of vertices . The more white pixel is, the more random graphs were covered by a general bounded biclique cover by the algorithm.

We can observe a phase transition behaviour in the results of experiments on both strategies and . As we can see on Figure 2 and Figure 3 there is a phase transition for a fixed ratio . Most of random graphs with have a biclique cover and our heuristic algorithm will find it. However, since our heuristic is incomplete, it is not clear how many random graphs with have biclique cover.

In case of strategies with the most interesting case is when . As the ratio gets close to we can expect smaller percentage of graphs having a bounded 2-biclique cover, hence our heuristic algorithm fails to find one in most cases.

Strategy behaves very similarly to but it doesn’t have an upper limit to phase transition. As we can see, there is an interesting phenomenon on Figure 3 between and . The strange shift is caused by using bigger bicliques by the algorithm.

As we can see from Table 2, for ratios smaller than it is better to use the algorithm with a heuristic for finding a bounded -biclique cover. For bigger ratios it is better to use a heuristic for general bounded biclique cover. It would be also interesting to perform more experiments with bounded 3-biclique covers and observe if a similar phenomenon will occur on strategy .

1 1.1 1.2 1.3 1.4 1.5
Table 3: Average running time (in ) of experiments with our heuristic algorithm. Each column represents

Average runtime of experiments on our heuristic can be seen in Table 3. For 3-CNF it has the same runtime for both strategies and all ratios of . This is because quite often an isolated vertex was created during the work of Algorithm 1. Which means the algorithm failed quickly in many cases. Runtime of strategy which uses bounded bicliques in cover is much worse than unbounded strategy . For -CNF as grow, the difference gets bigger. Its because with unbounded strategy we admit bigger bicliques in cover and hence our heuristic Algorithm 1 will run fewer iterations of the main cycle and succeeds or fails faster than .

5 Bounded Biclique SAT Encoding

We shall first describe the encoding of the problem of checking if a bipartite graph has a bounded biclique cover into SAT, then we will describe and evaluate the experiments we have performed to compare this approach with Algorithm 1. We will also describe the environment we have used to run the experiments.

5.1 Description of SAT Encoding

A valid biclique of a bipartite graph is a complete bipartite subgraph of the bipartite graph which follows the restriction on the size of the second partition. In particular, we require . Let us consider a bipartite graph and , let us define

We also denote the set of all bounded bicliques within the bipartite graph without restriction on the size of . We would use set of bicliques to check existence of a bounded k-biclique cover and to check existence of a general bounded biclique cover. We will encode problem of bounded (-)biclique cover on a bipartite graph . Let us fix where is either a natural number, or and let us describe formula for a given graph . With each biclique we associate a new variable . Every assignment of boolean values to variables , then specifies a set of bicliques. We want to encode the fact that the satisfying assignments of exactly correspond to bounded biclique covers of . To this end we use the following constraints:

  • For each vertex we add to an at-most-one constraint on variables , . This encodes the fact that the first partitions of bicliques in the cover have to be pairwise disjoint. We use a straightforward representation of the at-most-one constraint with a quadratic number of negative clauses of size .

  • For each clause we add to a clause representing an at-least-one constraint on variables , . This encodes the fact that each vertex of second partition belongs to a biclique in the cover.

5.2 Experimental Evaluation of Heuristic Algorithm

We can see that the number of variables in our encoding is equal to the number of all valid bicliques within the bipartite graph . If we consider bicliques in for a fixed , then the number of valid bicliques is polynomial in the size of but it can be exponential in (for the number can be exponential in the size of as well). For this reason we tested the encoding only with bicliques in , thus checking bounded 2-biclique cover. For bigger bicliques the running times of experiments increased so much that we would not be able to repeat the tests enough times for a reasonable number of variables. We used the encoding described in Section 5.1 to check the success rate of our heuristic algorithm on random bipartite graphs and to check the phase transition for an existence of bounded biclique cover. We ran the experiments on random bipartite graphs with for combinations of and the size of the second part for with step . For each we tested random graphs only for the ratios around the expected phase transition as observed in the Table 2.

1 1.05 1.1 1.15 1.2 1.25
3 18/18/100 0/0/100
4 95/95/100 50/56/100 7/10/100 0/1/100 0/0/100
5 100/100/100 100/100/100 90/98/98 52/87/87 10/34/34 1/1/1
6 100/100/100 100/100/100 100/100/100 85/99/99 42/53/53
7 100/100/100 99/100/100 89/90/90
8 99/100/100
Table 4: Number of bipartite graphs with bounded biclique cover (found by Algorithm 1/ SAT finished with true/ SAT finished within time limit). If the three numbers are missing in a cell, no experiments were run with the corresponding configuration. In case of black cells we expect values and in case of white empty cells we expect . See also the description within text.
1 1.05 1.1 1.15 1.2 1.25
3 0.005/0.014 0.005/0.01
4 0.005/1.7 0.006/25 0.006/21 0.005/13 0.006/7
5 0.005/0.2 0.005/0.98 0.006/11 0.006/216 0.006/993 0.005/4589
6 0.006/0.4 0.006/2.4 0.006/50 0.006/838 0.006/3646
7 0.006/73 0.006/166 0.007/1666
8 0.006/272
Table 5: Average runtime of Algorithm 1/average runtime of SAT on encoding in seconds.
1 1.05 1.1 1.15 1.2 1.25
3 0.52/2.16 0.58/2.39
4 0.13/1.05 0.02/0.40 0.002/0.06 0.001/0.005 0.001/0.009
5 0.25/1.13 0.06/1.16 0.007/0.12 0.002/0.096 /0.0008 /
6 0.12/0.88 0.023/0.30 0.003/0.095 0.0002/0.007 /0.0007
7 0.006/0.086 0.001/0.039 0.00018/0.0015
8 0.00026/0.0028
Table 6: Average/maximum ratio between running time of Algorithm 1 and running time of SAT on encoding in seconds.

The results of experiments are contained in tables 4 to 6. All these tables have a similar structure. Each cell represents a single configuration (row corresponding by a value of and column corresponding to the ratio where denotes the number of clauses and denotes the number of variables). In Table 4 each cell contains three numbers separated with slashes. The first is the number of instances (out of ) on which Algorithm 1 successfully found a bounded biclique cover. The second is the number of instances on which the SAT solver successfully solved the encoding and answered positively. The third is the number of instances on which the SAT solver finished within time limit which was set to 4 hours for each instance. In some cells the values are missing, for these configurations we did not run any experiments, because they are far from the observed phase transition (see Table 2). In case of black colored cells we assume that the results would be , in case of white colored cells we assume that the results would be . The gray colored cells mark the borders of observed phase transition intervals of existence of bounded biclique cover, light gray corresponds to the results given by the SAT solver, dark gray to the results given by the heuristic which form an upper bound on the correct values. We can see that in most cases the number of positive answers given by Algorithm 1 is close to the number of positive answers given by the SAT solver. However, there are some cases where one of the approaches was more successful — namely in cases of and . In the first case the SAT solver answered on instances positively while Algorithm 1 answered positively only on instances and in the second one SAT solver answered on positively and Algorithm 1 answered positively only on instances. However, in the second case we can see that SAT solver run over time limit (4 hours) in cases.

We also compared runtime of our heuristic algorithm and the SAT solver. As we can see in the Table 5

our heuristic algorithm is much faster in average case. Standard deviation of runtime of our heuristic algorithm is around

and the standard deviation of runtime of SAT solver is up to (where we have evaluated the average value and the standard deviation only on instances in which the SAT solver finished within the time limit 4 hours). These values are quite high compared to the running times. One of the reasons is perhaps the fact that the experiments were not run on a single computer, but on several comparable computers (see Section 5.3 for more details). Although in all cases on a single instance, the SAT solver and Algorithm 1 were run on a single computer and it makes thus sense to look at the ratio between the running times of these two. These are contained in Table 6. We can see that our heuristic is in most cases faster than the SAT solver using the encoding described in Section 5.1. Only for we can see that the maximum ratio is bigger than one. It means that in some cases SAT solver was faster than Algorithm 1, although not on average.

We can see from the results that on random -CNF formulas Algorithm 1 has a success rate close to the one of the SAT based approach. A big advantage of Algorithm 1 is that it is much faster.

5.3 Experiments environment

Let us say more on the environment in which the experiments were run. We used Glucose parallel SAT solver [3, 9]. Our experiments were executed on grid computing service MetaCentrum NGI [1]. All experiments were run on a single processor machine (Intel Xenon, AMD Opteron) with 4 cores and frequency 2.20GHz-3.30GHz. On each random bipartite graph , Algorithm 1 and the SAT solver were always run on the same computer. However, for the same configuration and different formulas, the experiments may have run on different computers. As we have noted in Section 5.2, this could be a reason of significantly high values of standard deviation of runtimes. The fact that the computer speed varied while the time limit for the SAT solver was still the same (4 hours) could have led to situations where the SAT solver would not finished, because it was run on a slower computer, and could potentially finish had it been run on a faster computer. We can see in Table 7

that most of the total 2000 instances finished within an hour, then only 26 finished between an hour and 2 hours, only 15 finished between 2 hours and 3 hours and only 9 finished between 3 hours and 4 hours. We can thus expect that the number of the border cases is similarly small. We can conclude that the variance in computer speeds had only minor influence on the number of SAT calls which finished within the time limit.

1h 1h-2h 2h-3h 3h-4h 4h total
1912 26 15 9 238 2200
Table 7: Number of test cases finished in given interval.

6 Conclusion

The first result of our paper is that the experimental threshold of phase transition of property “being matched” of 3-CNFs is around 0.92 which is much higher than the theoretical lower bound 0.64 proved for 3-CNF by J. Franco and A.V. Gelder [10]. This can be seen in Figure 1. Moreover our experiments suggest that for almost all formulas in -CNF are matched (if they have at most as many clauses as variables).

We have also proposed a heuristic algorithm for finding a bounded (-)biclique cover of a incidence graph of a given formula . In other words the algorithm tries to decide if is biclique satisfiable. We suggested three different strategies for selecting a seed in our heuristic and compared them. We can deduce from figures 2 and 3 that the success rate of our heuristic algorithm exhibits a phase transition phenomenon similar to the case of matched formulas. The exact values are shown in Table 2. Our results suggest that it is better to use Algorithm 1 to find a 2-biclique cover using strategy for ratios where denotes the number of clauses and denotes the number of variables in a given formula. For higher ratios it is better not to restrict the size of the first part of the bicliques in the cover and to use strategy .

matched heuristic
low high low high
3-CNF
5-CNF
Table 8: In this table we compare the phase transitions of property “being matched” and the success rate of Algorithm 1. Columns low and high have the same meaning as in tables 1 and 2

Table 8 presents a comparison of the results experiments on matched formulas with the results of experiments with Algorithm 1. As we can see, the success rate of Algorithm 1 exhibits a very similar phase transition to matched formulas. We can see that the low bound of the phase transition interval in case of matched formulas in -CNF is . In case of our algorithm the low bound of the phase transition interval is . A formula can be matched only if the ratio of the number of clauses to a number of variables is at most . According to the results of our experiments a random -CNF with is matched with high probability even in case the ratio is . However, for -CNFs the low value of phase transition of our algorithm equals and for it is even more than , which means that if is a formula in -CNF with variables and at most clauses, Algorithm 1 will most likely find a bounded biclique cover of the incidence graph of . These results are summarized in Table 2.

Our heuristic algorithm is not complete, in particular, it can happen that a formula is biclique satisfiable, but Algorithm 1 is unable to detect it. It means that we can only trust a positive answer of the algorithm. We have compared our heuristic with a SAT based approach which can also check that a formula is not biclique satisfiable. We can see in Table 4 that formulas on which Algorithm 1 fails to answer correctly, are concentrated around the observed phase transition, and that the algorithm answers correctly in most cases for other configurations. We can say that the success rate of Algorithm 1 is not far from the complete SAT based method. Moreover, as we can see in tables 5 and 6, our heuristic is significantly faster than a SAT solver on the encoding we have described.

7 Future work

There is still some space to improve our results.

We can try to develop better heuristics for selecting a seed and other steps of our algorithm. For example we can use other sizes of bicliques than and unbounded ones. It would also be interesting to test our heuristic on -biclique cover for . Additionally a deterministic selection heuristic of vertices in function restrictSeed could improve the success rate of our heuristic Algorithm 1.

It would be also interesting to find a better SAT encoding of the problem which would allow us to run experiments on bigger instances of input formulas.

The last question is, can be our heuristic algorithm and SAT encoding generalized to var-satisfiability?

References