Knowledge compilation languages as proof systems

03/10/2019 ∙ by Florent Capelli, et al. ∙ University of Lille 0

In this paper, we study proof systems in the sense of Cook-Reckhow for problems that are higher in the polynomial hierarchy than coNP, in particular, #SAT and maxSAT. We start by explaining how the notion of Cook-Reckhow proof systems can be apply to these problems and show how one can twist existing languages in knowledge compilation such as decision DNNF so that they can be seen as proof systems for problems such as #SAT and maxSAT.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Proof complexity studies the hardness of finding a certificate that a CNF formula is not satisfiable. A minimal requirement for such a certificate is that it should be checkable in polynomial time in its size, so that it is easier for an independent checker to assess the correctness of the proof than to redo the computation made by a solver. While proof systems have been implicitly used for a long time starting with resolution [DavisP60, DavisPLL62], their systematic study has been initiated by Cook and Reckhow [cook1979relative] who showed that unless , one cannot design a proof system where all unsatisfiable CNF have short certificates. Nevertheless, many unsatisfiable CNF may have short certificates if the proof system is powerful enough, motivating the study of how such systems, such as resolution [DavisPLL62] or polynomial calculus [Clegg96], compares in terms of succinctness (see [nordstrom2013pebble] for a survey). More recently, proof sytems found practical applications as SAT solvers are expected – since 2013 – to output proof of unsatisfiability in SAT competitions to avoid implementation bugs.

While the proof systems implicitly defined by the execution trace of modern CDCL SAT solvers is fairly well understood [pipatsrisawat2011power], it is not the case for tools solving harder problems on CNF formulas such as and . For , a resolution-like system for has been proposed by Bonet et al. [BonetLM07] for which a compressed version has been used in a solver by Bacchus and Narodytska [narodytska2014] but it is to the best of our knowledge the only such proof system. To the best of our knowledge, no proof system has been proposed for .

In this short paper, we introduce new proof systems for and . Contrary to the majority of proof systems for , our proof systems are not based on the iterative application of inference rules on the original CNF formula. In our proof systems, our certificates are restricted Boolean circuits representing the Boolean function computed by the input CNF formula. These restricted circuits originate from the field of knowledge compilation [DarwicheM2002], whose primary focus is to study the succinctness and tractability of representations such as Read Once Branching Programs [Wegener00] or deterministic DNNF [Darwiche01MC] and how CNF formula can be transformed into such representations. To use them as certificates for , we first have to add some extra information in the circuit so that one can check in polynomial time that they are equivalent to the original CNF. The syntactic properties of the input circuits then allow to efficiently count the number of satisfying assignments, resulting in the desired proof system. Moreover, we observe that most tools doing exact model counting are already implicitly generating such proofs. Our result generalizes known connections between regular resolution and Read Once Branching Programs (see [Jukna12, Section 18.2]).

The paper is organized as follows. Section 2 introduces all the notions that will be used in the paper. Section 3 contains the definition of certified that allows us to define our proof systems for and .

2 Preliminaries

Assignments and Boolean functions.

Let be a finite set of variables and a finite domain. We denote the set of functions from to as . An assignment on variables is an element of . A Boolean function on variables is an element of , that is, a function that maps an assignment to a value in . An assignment such that is called a satisfying assignment of , denoted by . We denote by the Boolean function on variables whose value is always . Given two Boolean functions and on variables , we write if for every , .

Cnf.

Let be a set of variable. A literal on variable is either a variable or the negation of a variable . A clause is a disjunction of literals. A conjunctive normal form formula, CNF for short, is a conjunction of clauses. A CNF naturally defines a Boolean function on variables : a satisfying assignment for a CNF on variable is an assignment such that for every clause of , there exists a literal of such that (where we define ). We often identify a CNF with the Boolean function it defines.

The problem is the problem of deciding, given a CNF formula , whether has a satisfying assignment. It is the generic -complete problem [cook1971complexity]. The problem is the problem of deciding, given a CNF formula , whether does not have a satisfying assignment. It is the generic -complete problem.

Given a CNF , we denote by the number of solutions of and by the maximum number of clauses of that can be simultaneously satisfied. The problem is the problem of computing given a CNF as input and the problem is the problem of computing given a CNF as input.

Cook-Reckhow proof systems.

Let be finite alphabets. A (Cook-Reckhow) proof system [cook1979relative] for a language is a surjective polynomial time computable function . Given , there exists, by definition, such that . We will refer to as being a certificate of .

In this paper, we will mainly be interested in proof systems for the problems and , that is, we would like to design polynomial time verifiable proofs that a CNF formula has solutions or that at most clauses in the formula can be simultaneously satisfied. For the definition of Cook-Reckhow, this could translate to finding a proof system for the languages and .

For example, a naive proof system for could be the following: a certificate that has solutions would be the list of the solutions together with a resolution proof that is not satisfiable where is the clause such that the only non-satisfying assignment is . One could then check in polynomial time that each of the assignments satisfies and that is indeed unsatisfiable and then output . This proof system is however not very interesting as one can construct very simple CNF with exponentially many solutions: for example the empty CNF on variables has and will thus have a certificate of size at least .

.

A decision Decomposable Negation Normal Form circuit on variables , for short, is a directed acyclic graph (DAG) having exactly one node of indegree called the source. Nodes of outdegree are called the sinks and are labeled by or . The other nodes have outdegree and can be of two types:

  • The decision nodes are labeled with a variable . One outgoing edge is labeled with and the other by , represented respectively as a solid and a dashed edge in our figures.

  • The -nodes are labeled with .

Moreover, we have two other syntactic properties. We introduce a few notations before explaining them. If there is a decision node in labeled with variable , we say that is tested in . We denote by the set of variables tested in . Given a node of , we denote by the whose source is and nodes are the nodes that can be reached in starting from . We also assume the following:

  • Every is tested at most once on every source-sink path of .

  • Every -gate of are decomposable, that is, for every -node with successors in , it holds that .

Figure 1: A computing .

Let . A source-sink path in is compatible with if and only if when is tested on , the outgoing edge labeled with is in . We say that satisfies if only -sinks are reached by paths compatible with . A and the paths compatible with the assignment are depicted in bold red on Figure 1. Observe that a -sink is reached so does not satisfy . We will often identify a with the Boolean function it computes.

Observation 1

Given a on variables and a source-sink path in , there exists such that is compatible with . Indeed, by definition, every variable is tested at most once in , thus, if is tested on in a decision node and contains the outgoing edge labeled with , we can choose . The value of for a variable not tested on can be chosen arbitrarily.

The size of a , denoted by is the number of edges of the underlying graph of .

Tractable queries.

The main advantage of representing a Boolean function with a is that it makes the analysis of the function easier. Given a , one can easily find a satisfying assignment by only following paths backward from -sinks. Similarly, one can also count the number of satisfying assignments or find one satisfying assignment with the least number of variables set to etc. The relation between the queries that can be solved efficiently and the representation of the Boolean function has been one focus of Knowledge Compilation. See [DarwicheM2002] for an exhaustive study of tractable queries depending on the representation. Let be a Boolean function. In this paper, we will mainly be interested in solving the following problems:

  • Model Counting Problem (): return the number of satisfying assignment of .

  • Clause entailment (): given a clause on variables , does ?

  • Maximal Hamming Weight (): given , compute

All these problems are tractable when the Boolean function is given as a :

Theorem 2.1 ([Darwiche01MC, KoricheBLM16])

Given a , one can solve problems on the Boolean function represented by in linear time in .

The tractability of on has the following useful consequence:

Corollary 1

Given a and a CNF formula , one can check in time whether .

Proof

One simply has to check that for every clause of , , which can be done in polynomial time by Theorem 2.1.

3 Knowledge compilation based proof systems

Theorem 2.1 suggests that given a CNF , one could use a computing as a certificate for . The proof system could then check the certificate as follows:

  1. Compute the number of satisfying assignments of .

  2. Check whether is equivalent to .

  3. If so, return .

While Step 1 can be done in polynomial time by Theorem 2.1, it turns out that Step 2 is not tractable:

Theorem 3.1

The problem of checking, given a CNF and an as input, whether is -complete.

Proof

The problem is clearly in . For completeness, there is a straightforward reduction to . Indeed, observe that a CNF on variables is not satisfiable if and only if . Moreover, is easily represented as a having only one node: a -labeled sink.

3.1 Certified

The reduction used in the proof of Theorem 3.1 suggests that the -completeness of checking whether comes from the fact that can succinctly represent . In this section, we introduce restrictions of called certified for which one can check whether a CNF formula entails the certified . The idea is to add information on -sink to explain which clause would be violated by an assignment leading to this sink.

Our inspiration comes from a known connection between regular resolution and read once branching programs (i.e. a without -gate [BeameLRS13]) that appears to be folklore but we refer the reader to the book by Jukna [Jukna12, Section 18.2] for a thorough and complete presentation. It turns out that a regular resolution111A regular resolution proof is a resolution proof where, on each path, a variable is resolved at most once. proof of unsatisfiability of a CNF can be represented by a read once branching program whose sinks are labeled with clauses of . Moreover, for every , if a sink labeled by a clause is reached by a path compatible with , then . We generalize this idea so that the function represented by a is not only an unsatisfiable CNF:

Definition 1

A certified on variables is a on variables such that every -sink of is labeled with a clause . is said to be correct if for every such that there is a path from the source of to a -sink compatible with , .

Given a certified , we denote by the set of -sinks of and by .

Intuitively, the clause labeling a -sink is an explanation on why one assignment does not satisfy the circuit. The degenerated case where there are only -sinks and no -gates corresponds to the characterization of regular resolution.

A crucial property of certified is that their correctness can be tested in polynomial time:

Theorem 3.2

Given a certified , one can check in polynomial time whether is correct.

Proof

By definition, is not correct if and only if there exists a -sink , a literal in , an assignment such that and a path in from the source to compatible with . By Observation 1, it is equivalent to the fact that there exists a path from the source to that: either does not test the underlying variable of or contains the outgoing edge corresponding to when the underlying variable of is tested.

In other words, is correct if and only if for every -sink and for every literal of with variable , every path from the source to tests variable and contains the outgoing edge corresponding to an assignment such that .

This can be checked in polynomial time. Indeed, fix a -sink and a literal of . For simplicity, we assume that (the case is completely symmetric). We have to check that every path from the source to contains a decision node on variable and contains the outgoing edge of labeled with . To check this, it is sufficient to remove all the edges labeled with , going out of a decision node on variable and test that the source and are now in two different connected components of , which can obviously be done in polynomial time. Running this for every -sink and every literal of gives the expected algorithm.

The clauses labeling the -sinks of a correct certified naturally connect to the function computed by :

Theorem 3.3

Let be a correct certified on variables . We have .

Proof

Observe that if and only if for every , if does not satisfy then does not satisfy . Now let be an assignment that does not satisfy . By definition, there exists a path compatible with from the source of to a -sink of . Since is correct, . Thus, does not satisfy as is by definition a clause of .

Corollary 2

Let be CNF formula and be a correct certified such that every clause of are also in . Then .

3.2 Proof systems

Proof system for .

One can use certified to define a proof system for . The Knowledge Compilation based Proof System for , for short, is defined as follows: given a CNF , a certificate that has satisfying assignments is a correct certified such that:

  • every clause of are clauses of ,

  • computes and has satisfying assignments.

To check a certificate , one has to check that is equivalent to and has indeed satisfying assignments, which can be done in polynomial time as follows:

  • Check that is correct, which is tractable by Theorem 3.2.

  • Check that , which is tractable by Corollary 1 and that every clause of are clauses of . By Corollary 2, it means that .

  • Computes the number of solutions of , which is tractable by Theorem 2.1.

  • Returns .

This proof system for is particularly well-suited for the existing tools solving in practice. Many of them such as sharpSAT [thurley06] or cachet [sang04] are based on a generalization of DPLL for counting which is sometimes refered as exhaustive DPLL in the literature. It has been observed by Huang and Darwiche [HuangD05] that these tools were implicitly constructing a equivalent to the input formula. Tools such as c2d [OztokD15], D4 [LagniezM17] or DMC [lagniez2018dmc] already exploit this connection and have the option to directly output an equivalent . These solvers explore the set of satisfying assignments by branching on variables of the formula which correspond to a decision node and, when two variable independent components of the formula are detected, compute the number of satisfying assignments of both components and take the product, which corresponds to a decomposable -gate. When a satisfying assignment is reached, it corresponds to a -sink. If a clause is violated by the current assignment, then it corresponds to a -sink. At this point, the solvers could also label the -sink by the violated clause which would give a correct certified .

Proof system for .

As for , one can exploit the tractability of many problems on to define a proof system for . Given a CNF formula , ket be the formula where each clause is augmented with a fresh selector variable. Let . Observe that is exactly since if and , then . By Theorem 2.1, if is represented by a , then one can solve this problem in polynomial time in . The proof system is defined as follows: given a CNF , a certificate is a correct certified with clauses in that computes . The proof may be checked as before by checking both the correctness of and the fact that . However, we are not aware of any tool solving based on this technique and thus the implementation of such a proof system in existing tools may not be realistic. It will still be worth comparing this proof system with the resolution for  [BonetLM07].

In general, we observe that we can use this idea to build a proof system for any tractable problem on . This could for example be applied to weighted versions of and .

Combining proof systems.

An interesting feature of -like proof systems is that they can be combined with other proof systems for to be made more powerful. Indeed, one could label the -sink of the with a clause that are not originally in the initial CNF but that is entailed by , that is, . In this case, Corollary 2 would still hold. The only thing that is needed to obtain a real proof system is that a proof that has to be given along the correct certified , that is, a proof of unsatisfiability of . Any proof system for may be used here.

Lower bounds.

Lower bounds on the size of representing CNF formulas may be directly lifted to lower bounds for or . There exists families of monotone -CNF that cannot be represented as polynomial size  [BeameLRS13, BovaCMS16, CapelliPhd2016]. It directly gives the following corollary:

Corollary 3

There exists a family of monotone -CNF such that is of size and any proof for in and is of size at least .

An interesting open question is to find CNF formulas having polynomial size but no small proof in .

4 Future work

In this paper, we have developed techniques based on circuits used in knowledge compilation to extend existing proof systems for tautology to harder problems. It seems possible to implement these systems into existing tools for based on exhaustive DPLL, which would allow these tools to provide an independently checkable certificate that their output is correct, the same way -solvers returns a proof on unsatisfiable instances. It would be interesting to see how adding the computation of this certificate to existing solver impacts their performances. Another interesting direction would be to compare the power of with the resolution for of Bonet et al. [BonetLM07] and to see how such proof systems could be implemented in existing tools for . Finally, we think that a systematic study of other languages used in knowledge compilation such as deterministic DNNF should be done to see if they can be used as proof systems, by trying to add explanations on why an assignment does not satisfy the circuit.

References