# Generating clause sequences of a CNF formula

Given a CNF formula Φ with clauses C_1,...,C_m and variables V={x_1,...,x_n}, a truth assignment a:V→{0,1} of Φ leads to a clause sequence σ_Φ(a)=(C_1(a),...,C_m(a))∈{0,1}^m where C_i(a) = 1 if clause C_i evaluates to 1 under assignment a, otherwise C_i(a) = 0. The set of all possible clause sequences carries a lot of information on the formula, e.g. SAT, MAX-SAT and MIN-SAT can be encoded in terms of finding a clause sequence with extremal properties. We consider a problem posed at Dagstuhl Seminar 19211 "Enumeration in Data Management" (2019) about the generation of all possible clause sequences of a given CNF with bounded dimension. We prove that the problem can be solved in incremental polynomial time. We further give an algorithm with polynomial delay for the class of tractable CNF formulas. We also consider the generation of maximal and minimal clause sequences, and show that generating maximal clause sequences is NP-hard, while minimal clause sequences can be generated with polynomial delay.

## Authors

• 14 publications
• 9 publications
• 3 publications
• 11 publications
• 7 publications
• 12 publications
• ### Backdoors to Acyclic SAT

Backdoor sets, a notion introduced by Williams et al. in 2003, are certa...
10/28/2011 ∙ by Serge Gaspers, et al. ∙ 0

• ### Solving MaxSAT and #SAT on structured CNF formulas

In this paper we propose a structural parameter of CNF formulas and use ...
02/26/2014 ∙ by Sigve Hortemo Sæther, et al. ∙ 0

• ### Strong Backdoors to Bounded Treewidth SAT

There are various approaches to exploiting "hidden structure" in instanc...
04/27/2012 ∙ by Serge Gaspers, et al. ∙ 0

• ### Evolved preambles for MAX-SAT heuristics

MAX-SAT heuristics normally operate from random initial truth assignment...
02/18/2011 ∙ by Luis O. Rigo Jr, et al. ∙ 0

• ### CNF Satisfiability in a Subspace and Related Problems

We introduce the problem of finding a satisfying assignment to a CNF for...
08/12/2021 ∙ by Vikraman Arvind, et al. ∙ 0

• ### Generating Shortest Synchronizing Sequences using Answer Set Programming

For a finite state automaton, a synchronizing sequence is an input seque...
12/20/2013 ∙ by Canan Güniçen, et al. ∙ 0

• ### Face flips in origami tessellations

Given a flat-foldable origami crease pattern G=(V,E) (a straight-line dr...
10/13/2019 ∙ by Hugo A. Akitaya, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The concept of well-designed pattern trees was introduced by Letelier et al. [9] as a convenient graphic representation of conjuctive queries extended by the optional operator. The nodes of such a tree correspond to the queries, while the tree itself represents the optional extensions. Well-designed pattern trees have been studied from a complexity point of view in several aspects. One of the most interesting problems in the context of query languages is the generation problem, that is, generating the solutions one after the other without repetition.

#### Previous work

The generation problem was studied for First-Order and Conjunctive Queries [3, 5, 7, 12] and for well-designed pattern trees [9]. Recently, Kröll et al. [8] initiated a systematic study of the complexity of the generation problem of well-designed pattern trees. They identified several tractable and intractable cases of the problem both from a classical and from a parameterized complexity point of view. One class of pattern trees however remained unclassified. For a class of conjunctive queries, a well-designed pattern tree is globally in if for every subtree of the corresponding conjunctive query is also in . The treewidth of a conjunctive query is the treewidth of its Gaifman-graph [6]. In [8], the complexity of the generation problem for the class of well-designed pattern trees falling globally in the class of queries of treewidth at most and having -semi-bounded interface was left open (see [8, Table 1 on page 16]).

At the Dagstuhl Seminar 19211 “Enumeration in Data Management”, Kröll proposed an open problem on the generation of clause sequences of CNF formulas [2, Problem 4.7]. The problem is motivated by the fact that it can be reduced to the above mentioned unsolved case of pattern trees, thus any bound on the generation complexity would be helpful in understanding the general problem. A generation algorithm outputs the objects in question one by one without repetition. We call it a polynomial delay procedure if the computing time between any two consecutive outputs is bounded by a polynomial of the input size. We call it incrementally polynomial, if for any the first objects can be generated in polynomial time in the input size and . Finally, it is called total polynomial if all objects are generated in polynomial time in the input size and .

The problem studied in this paper can be formalized as follows. Let be a set of Boolean variables and be a CNF in these variables with clauses . For an assignment , the corresponding binary sequence is called a signature111We prefer the term signature over the term clause sequence proposed by Kröl, since it is a binary string, not a sequence of clauses. Therefore we use the term signature in the rest of the paper. of , that is, if clause evaluates to under assignment , and otherwise. In particular, this means that is satisfiable if and only if there exists some assignment with . Moreover, MAX-SAT and MIN-SAT can be encoded by asking for the signature with the largest and smallest sum of elements, respectively.

As an example, consider the CNF formula , where , , and . Then assignment leads to signature , while assignment leads to signature . It is easy to see that has six different signatures. In general, if the number of signatures is , then generating them in total polynomial time is not difficult. However, their number may be , presenting a potential challenge for generation.

Given a CNF , we denote by , and call a -CNF if . The number of clauses and the number of literals appearing in are denoted by and

, respectively. Vectors are written using bold fonts throughout, e.g.

. The problem asked in [2] is for -CNF formulas where is a fixed positive integer, but we also consider the same problem for general CNFs.

[boxsep=2pt,left=4pt,right=4pt,top=4pt,bottom=4pt] Generation of signatures ()

Input: A CNF .

Output: All possible signatures of .

Motivated by MAX-SAT and MIN-SAT, we also consider maximal and minimal signatures. A signature of a CNF is called maximal (resp. minimal) if an inclusionwise maximal (resp. minimal) subset of the clauses takes value 1.

[boxsep=2pt,left=4pt,right=4pt,top=4pt,bottom=4pt] Generation of maximal signatures

Input: A CNF .

Output: All possible maximal signatures of . [boxsep=2pt,left=4pt,right=4pt,top=4pt,bottom=4pt] Generation of minimal signatures

Input: A CNF .

Output: All possible minimal signatures of .

#### Our results

We show that can be solved in incremental polynomial time for formulas with a bounded dimension, thus answering the open problem posed by Kröll, and with polynomial delay for the class of tractable CNF formulas. For the class of formulas with bounded dimension and co-occurrence, we derive a faster incremental polynomial algorithm. We also show that generating maximal signatures is NP-hard, while minimal signatures can be generated with polynomial delay.

#### Organization

Our algorithm with polynomial delay for the class of tractable CNF formulas is given in Section 2. Section 3 discusses CNFs with bounded dimension: an incremental polynomial algorithm is presented in Section 3.1 for CNFs with bounded dimension and co-occurrence, while our main result answering the question of Kröll is presented in Section 3.2. The generation of maximal and minimal clause sequences is considered in Section 4. Finally, we conclude the paper in Section 5, where a ‘reversed’ variant of the problem is proposed as an open question.

## 2 Tractable CNFs

Given a CNF , a CNF is called a sub-CNF of if , and denoted by . We call a family of CNFs tractable if for any CNF in this family the satisfiability of any sub-CNF of can be decided in polynomial time even after fixing any subset of the variables at arbitrary values. For example, the classes of 2-CNFs or Horn CNFs are tractable.

###### Theorem 1.

If belongs to a tractable family and has clauses, then its signatures can be generated with a delay of SAT-calls.

###### Proof.

The idea is to apply the so-called ‘flashlight’ approach in the signature space, using SAT as a ‘flashlight’ [1]. Let . We are going to build a binary tree in which the paths from the root to the vertices of the tree correspond to binary values of initial segments of the set of clauses, that is, for some . There exists a signature with this prefix if and only if the CNF formed by the clauses set to value one in this sequence is satisfiable even after all the forced fixing of variables that appear in clauses whose value is zero (note that a clause has value if and only if all the literals in it are ). If such a CNF is not satisfiable, we backtrack and do not explore the subtree rooted at this vertex as there exists no signature with this prefix. If the CNF is satisfiable, we continue building the corresponding subtree which in this is guaranteed to contain at least one signature. The algorithm will not backtrack above this vertex before outputing all (at least one) signatures in this subtree. It is not difficult to verify that after at most calls to SAT we can output a new signature not generated before. After outputting the last signature, the procedure terminates after at most SAT calls. ∎

###### Remark 2.

Let us remark that the family of monotone CNFs is tractable, but for this case there is a more efficient polynomial delay generation of the signatures. Indeed, in this case we can view a clause as a subset of the variables. Consequently, the set of zeros in a signature corresponds to a union of clauses. We claim that all such unions can be generated with delay, where is the number of clauses, implying that all signatures of can be generated with polynomial delay.

To see this claim, we represent unions as leaves of a binary tree of depth (nodes correspond to variables), where we construct only the vertices that are on paths to the leaves. Besides the binary tree, we keep the leaves in a last-in-first-out queue222The size of the queue can be exponential in as it contains the leaves of the binary tree that is being built.. Initially, leaves correspond to individual clauses of . Each time before outputting the first union in the queue, we check for all clauses if is a new union or not by using our binary tree. This takes time for one clause, and time for all the clauses of . Whenever a new union is found, it is added to the tree and the queue as a last element. After this, we output and remove it from the queue. It is not difficult to verify that this gives us an delay generation of all unions. Note that in this case Theorem 1 guarantees only an delay, because every SAT call requires time.

## 3 CNFs with bounded dimension

### 3.1 Bounded co-occurrence

Given a CNF , we denote by the conflict graph of . The vertices of are the clauses of and edges are exactly the conflicting pairs of clauses, i.e., pairs for which there exists a literal such that .

Let be a maximal independent set of , and let denote the set of literals appearing in the clauses of . We define a partial assignment by setting all literals of to zero (and hence the complementary literals are set to ). The signature associated to is then defined as . The coordinates of are well-defined as if and only if for . We will dismiss the subscript whenever the CNF in question is clear from the context. Note that for different maximal independent sets of we have . It is worth mentioning that all maximal independent sets of can be generated with polynomial delay [13, 10], which is hence a good start for CNF signature generation.

Assume that has bounded dimension, i.e., for a constant we have for all . Let us define . We say that is of -bounded co-occurrence if for and is a fixed constant.

###### Theorem 3.

If has bounded dimension and co-occurrence, then its signatures can be generated in incremental polynomial time.

###### Proof.

Let us construct greedily a maximal induced matching in . Note that has at least maximal independent sets (and hence at least this many signatures can be generated with polynomial delay, as explained above). We denote by the set of clauses that have edges in connecting them to some of the clauses covered by , and set . Note that is an independent set in .

Assume that , for all , and for all . According to our assumptions, and are fixed constants. Observe that with these notations we have . We denote by the number of variables involved in clauses of . Note that we have .

We denote by the (possibly empty) set of variables that are monotone in and appear only in clauses of (some variables appear only positively while some others appear only negatively). Let us first set all literals in to , and consider the resulting CNF in variables. We generate with polynomial delay the maximal independent sets , of , and the corresponding signatures , . Now we have , and thus we can try all binary assignments to the variables in time, and see if we get some more signatures.

Assume we get distinct signatures. By switching the literals in , we may get new signatures, resulting from changing some of the zeros in a signature to one. For any partial assignment to the variables, this is a set-union generation problem that can be solved with polynomial delay, see Remark 2. We may get in this way the same signature multiple times, but no more than times, and thus at this stage the additional signatures are also generated in incremental polynomial time. ∎

### 3.2 Unbounded co-occurrence

In the previous section, we considered CNFs with bounded dimension and co-occurrence. The running time of the algorithm provided by Theorem 3 depends exponentially on , hence it is not suitable for handling the general case. In the present section, a more general procedure is given based on a different approach.

For a CNF , we denote by the so called dual graph of [11]. The vertices of are the clauses of and edges are exactly the pairs of clauses for which there exists a variable that occurs in both and (complemented or not). If is an independent set of , then the clauses of have pairwise disjoint sets of variables involved.

###### Theorem 4.

There exists an algorithm that generates the signatures of a CNF consisting of clauses in binary variables in total time, where and is the number of signatures.

###### Proof.

We prove the claim by induction on . For the claim follows by Theorem 1.

Assume now that we already proved the claim for all , and let us consider a CNF with . Let us associate to its dual graph as defined above. Let be a maximal independent set of . Such a set can be obtained by a simple greedy procedure in polynomial time in the size of . Note that clauses in involve pairwise disjoint sets of variables, due to the fact that is an independent set of . Thus, we can choose a literal for each clause , set all other literals in to zero, set all other variables not occurring in clauses of to zero, and make all possible truth assignment to the literals , . This way we obtain different binary signatures of . Note that we can output these signatures with polynomial delay.

The total number of variables involved in clauses of is . Hence we can assign in all possible ways values to these variables, and produce subproblems , in the remaining variables in time which is polynomial in the input size and , since is a fixed constant. Each of these residual problems is of dimension at most . Indeed, each of the clauses not in shares at least one variable with the clauses of , since is a maximal independent set of , and now that shared variable is fixed at a binary value.

We apply algorithm to each of the residual sub-CNFs , , one by one. This way we produce signatures that extend the pattern on defined by , for all one by one. We may produce the same signature in this way again and again, but no more than times. Since , we can show that this procedure works in total polynomial time.

To see this let us introduce some additional notation. We denote by , the nonempty sets of (partial) assignments that produce the same signature on the clauses of . For , let us denote by the residual CNF, and by the number of signatures of . We denote by the running time of the above described recursive algorithm on CNF and let be the maxima of over all CNFs with at most clauses on variables having and having at most signatures.

The total computational time in the first phase of the above procedure that ends with producing a list of residual CNFs, each of is bounded by

 O(m2n)+O(mnk0)+O(mnkd0)≤Km2nkd0

for a suitable constant that does not depend on , , and . The first term on the left hand side is the time to build and to find a maximal independent set . The second term is the time we need to generate the initial signatures. The third term is the time to generate the subproblems.

For and with the CNFs and cannot share signatures, since those must already differ on by the definition of the sets for . However, for CNFs and may share (many) signatures. Discounting the one signature we already produced with a given trace on , we can still expect

 kj ≥ maxx∈Xjk(x)−1

different signatures produced by algorithm when we use it for CNFs , . Thus, in total we get

 k = k0+k1+⋯+k2|S|

different signatures for . The total running time on CNFs , can be bounded by

 ∑x∈Xjg(Φ(x)) ≤ |Xj|G(m,n,d−1,kj).

Thus, for the total running time of algorithm on we get

 g(Φ)≤G(m,n,d,k) ≤ Km2nkd0+k0∑j=1|Xj|G(m,n,d−1,kj) ≤ Km2nkd0+kd0G(m,n,d−1,k),

where for the last inequality we used for all , implying , which allows this quantity to be factored out of the sum, that can be then upper bounded by . Using this we can show by induction on that

 G(m,n,d,k)≤Ldm2nk(d2)

for some constant (we will choose ) which will complete the proof of our claim. Now

 G(m,n,d,k) ≤ Km2nkd0+kd0G(m,n,d−1,k) ≤ Km2nkd0+kd0L(d−1)m2nk(d−12) ≤ Lm2nkd+kdL(d−1)m2nk(d−12) ≤ Lm2nkd+L(d−1)m2nk(d−12)+d≤Ldm2nk(d2).

###### Corollary 5.

The algorithm constructed in the above proof in fact works in incremental polynomial time.

###### Proof.

Using the above theorem, we can prove this claim by induction on the dimension . When , the claim is trivially true.

Consider now the general case, as in the proof of the above theorem. As we remarked there, producing the first signatures in fact can be done with polynomial delay. After this we start processing the CNFs for , . Note that the signatures produced from , and , are all different if . Note also that for all , , and thus we can assume by induction that their signatures can be produced in incremental polynomial time in the size of , which is bounded by the size of . Thus, if , then we can produce new signatures in incremental polynomial time, in fact regardless how many we produced previously (including the we have from the first phase.) Let us denote by the polynomial bounding the total time processing . If , then maybe the first signatures produced from coincide with the ones we already generated from , but still after at most time we get a new signature. In the worst case, we have for all , , in which case processing , may not produce any new signatures. Since , this means that the largest gap between the output of the last signature of and next new signature is not more than

, at a moment when we have already produced

signatures. Thus this largest time gap between two outputs is still bounded by a polynomial of the input size and the number of signatures produced so far. ∎

## 4 Generating maximal and minimal signatures

Generation of maximal signatures is difficult as it includes SAT as a special case.

###### Theorem 6.

Generating all maximal signatures is NP-hard.

###### Proof.

Let us consider a CNF , and observe that its unique maximal signature is the all-one vector if and only if is satisfiable. Hence any total polynomial time algorithm generating the maximal signatures would detect satisfiability of . As SAT is difficult in general [4], the theorem follows. ∎

It turns out that minimal signatures can be generated efficiently.

###### Theorem 7.

Minimal signatures can be generated with polynomial delay.

###### Proof.

We claim that there is a one-to-one correspondance between minimal signatures of a CNF and maximal independent sets of its conflict graph . Since can be built in polynomial time from and maximal independent sets of a graph can be generated with polynomial delay [13, 10], this would prove the theorem.

To see the above claim, assume first that a signature is a minimal signature of . Note that the set is an independent set in . For any with there must exist a conflict between and some , since otherwise we could set to zero without forcing any of the clauses in to change their values, contradicting the minimality of . Thus must be a maximal independent set.

The other direction follows from the fact that if is a maximal independent set of and we set all the clauses in to zero, then all other clauses of are forced to take value one due to the conflicts between and other vertices of . ∎

## 5 Conclusions

In this paper we show that all signatures of a given CNF with a bounded dimension can be generated in incremental polynomial time, answering an open problem posed by Kröll  [2, Problem 4.7]. A faster incremental polynomial algorithm is provided for the class of formulas where both the dimension and the co-occurrence are bounded. Moreover, it is also shown that the same task can be done with polynomial delay if the input CNF is from a tractable class (in this case no bound on dimension or co-occurrence is necessary). Finally, it is proved that generating maximal signatures is NP-hard, while minimal signatures can be generated with polynomial delay.

In this context it is interesting to note that given a 3-CNF with clauses and the vector it is NP-hard to test whether is a signature of , or not ( is a signature if only if is satisfiable). On the other hand, our results show that generating all signatures of can be done in incremental polynomial time. This is a rather unusual behavior for a generation problem. Typically, if all solutions of a given problem can be generated in incremental polynomial time, checking if a given candidate is a solution or not is computationally easy.

An additional problem connected to CNF signatures was stated at the Dagstuhl Seminar 19211 by Gy. Turán. Given a set , does there exist a CNF with clauses such that is exactly its set of all signatures? If yes, can such a CNF be computed efficiently? This ‘reverse’ problem (get the signatures, output clauses) to the problem presented in this paper (get the clauses, output signatures) is to the best of our knowledge completely open.

#### Acknowledgements

Kristóf Bérczi was supported by the János Bolyai Research Fellowship of the Hungarian Academy of Sciences and by the ÚNKP-19-4 New National Excellence Program of the Ministry for Innovation and Technology. Ondřej Čepek and Petr Kučera gratefully acknowledge a support by the Czech Science Foundation (Grant 19-19463S). Projects no. NKFI-128673 and no. ED_18-1-2019-0030 (Application-specific highly reliable IT solutions) have been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the FK_18 and the Thematic Excellence Programme funding schemes, respectively. This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

## References

• [1] E. Boros, K. Elbassioni, and V. Gurvich. Algorithms for generating minimal blockers of perfect matchings in bipartite graphs and related problems.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

, 3221:122–133, 2004.
• [2] E. Boros, B. Kimelfeld, R. Pichler, and N. Schweikardt. Enumeration in data management (dagstuhl seminar 19211). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
• [3] A. A. Bulatov, V. Dalmau, M. Grohe, and D. Marx. Enumerating homomorphisms. Journal of Computer and System Sciences, 78(2):638–650, 2012.
• [4] S. A. Cook. The complexity of theorem-proving procedures. In

Proceedings of the third annual ACM symposium on Theory of computing

, pages 151–158, 1971.
• [5] A. Durand, N. Schweikardt, and L. Segoufin. Enumerating answers to first-order queries over databases of low degree. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 121–131. ACM, 2014.
• [6] J.-L. Guigues and V. Duquenne. Familles minimales d’implications informatives résultant d’un tableau de données binaires. Mathématiques et Sciences humaines, 95:5–18, 1986.
• [7] W. Kazana and L. Segoufin. Enumeration of first-order queries on classes of structures with bounded expansion. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems, pages 297–308. ACM, 2013.
• [8] M. Kröll, R. Pichler, and S. Skritek. On the complexity of enumerating the answers to well-designed pattern trees. In 19th International Conference on Database Theory (ICDT 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
• [9] A. Letelier, J. Pérez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic web queries. ACM Transactions on Database Systems (TODS), 38(4):25, 2013.
• [10] K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In Scandinavian workshop on algorithm theory, pages 260–272. Springer, 2004.
• [11] M. Samer and S. Szeider. Algorithms for propositional model counting. Journal of Discrete Algorithms, 8(1):50 – 64, 2010.
• [12] L. Segoufin. Enumerating with constant delay the answers to a query. In Proceedings of the 16th International Conference on Database Theory, pages 10–20. ACM, 2013.
• [13] S. Tsukiyama, M. Ide, H. Ariyoshi, and I. Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505–517, 1977.