Towards a Better Understanding of (Partial Weighted) MaxSAT Proof Systems

03/05/2020 ∙ by Javier Larrosa, et al. ∙ Universitat Politècnica de Catalunya 0

MaxSAT is a very popular language for discrete optimization with many domains of application. While there has been a lot of progress in MaxSAT solvers during the last decade, the theoretical analysis of MaxSAT inference has not followed the pace. Aiming at compensating that lack of balance, in this paper we do a proof complexity approach to MaxSAT resolution-based proof systems. First, we give some basic definitions on completeness and show that refutational completeness makes compleness redundant, as it happens in SAT. Then we take three inference rules such that adding them sequentially allows us to navigate from the weakest to the strongest resolution-based MaxSAT system available (i.e., from standalone MaxSAT resolution to the recently proposed ResE), each rule making the system stronger. Finally, we show that the strongest system captures the recently proposed concept of Circular Proof while being conceptually simpler, since weights, which are intrinsic in MaxSAT, naturally guarantee the flow condition required for the SAT case.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Proof complexity is the field aiming to understand the computational cost required to prove or refute statements. Different proof systems may provide different proofs for the same formula and some proof systems are provably more efficient than others. When that happens, proof complexity cares about which elements of the more powerful proof system really make the difference.

In propositional logic, proof systems that work with CNF formulas have attracted the interest of researchers for several decades. One of the reasons is that CNF is the working language of the extremely successful SAT solvers and the most essential ingredients of these algorithms (e.g, conflict analysis) can be understood and analyzed as proofs [5].

(Partial Weighted) MaxSAT is the optimization version of SAT. Although discrete optimization problems can be modeled and solved with SAT solvers, many of these problems are more naturally treated as MaxSAT. For this reason the design of MaxSAT solvers has attracted the interest of researchers in the last decade. Interestingly, while some of the first efficient MaxSAT solvers were strongly influenced by MaxSAT inference [9], this influence has diminished along time. The currently most efficient algorithms solve MaxSAT by sophisticated sequences of calls to SAT solvers[11, 1, 4].

We think it is important to understand this scientific trend with a more formal approach and such understanding must go through an analysis of the possibilities and limitations of MaxSAT proof systems. The purpose of this paper is to contribute in that direction. We extend some classic proof complexity concepts (i.e, entailment, completeness, etc) to MaxSAT and analyze three proof systems of increasing complexity: from stand-alone MaxSAT resolution (Res) [9] to the recently proposed resolution with extension (ResE) [10]. For the sake of clarity, we break the extension rule of ResE into two atomic rules: split and virtual; and analyze their incremental power. Our results show that each add-on makes a provable stronger system. More precisely, we have observed that: Res is sound and refutationally complete. Adding the split rule (ResS) we get completeness and (unlike what happens in SAT) some exponential speed-up in certain refutations. Further adding the virtual rule (ResSV), which allows to keep negative weights during proofs, we get further speed-up by capturing the concept of circular proofs [3]. We also give the interesting and somehow unexpected result that in some cases rephrasing a MaxSAT refutation as a MaxSAT entailment may transform the problem from exponentially hard to polynomial when using ResSV.

The structure of the paper is as follows: in Sections 2 and 3 we provide preliminaries on SAT and MaxSAT, respectively. In Section 4 we define some variations of the Pigeon Hole Problem that we need for the proofs of the theorems. In Section 5 we provide basic definition and properties on MaxSAT proof systems and introduce and analyze the different systems addressed in the paper. In Section 6 we show how the strongest proof system ResSV captures the notion of Circular Proof. Finally, in Section 7, we give some conclusions.

2 SAT Preliminaries

A boolean variable takes values on the set . A literal is a variable (positive literal) or its negation (negative literal). A clause is a disjunction of literals. A clause is satisfied by a truth assignment if contains at least one of the literals in . The empty clause is denoted and cannot be satisfied. The negation of a clause is satisfied if all its literals are falsified and this can be trivially expressed in CNF as the set of unit clause .

A CNF formula is a set of clauses. A truth assignment satisfies a formula if it satisfies all its clauses. If such an assignment exists, we say that the assignment is a model and the formula is satisfiable, noted . Determining whether a formula is satisfiable constitutes the well-known SAT Problem.

We say that formula entails formula , noted , if every model of is also a model of . Two formulas and are equivalent, noted , if they entail each other.

An inference rule is given by a set of antecedent clauses and a set of consequent clauses. In SAT, the intended meaning of an inference rule is that if some clauses of a formula match the antecedents, the consequents can be added. The rule is sound if every truth assignment that satisfies the antecedents also satisfies the consequents. The process of applying an inference rule to a formula is noted .

Consider the following two rules [12] [3],

where and are arbitrary (possibly empty) disjunctions of literals and is an arbitrary variable. In propositional logic it is customary to define rules with just one consequent because one rule with consequents can be obtained from one-consequent rules. As we will see, this is not the case in MaxSAT. For this reason, here we prefer to introduce the two-consequents split rule instead of the equivalent weakening rule [3] to keep the parallelism with MaxSAT more evident.

A proof system is a set of inference rules. A proof of length under a proof system is a finite sequence where is the original formula and each is obtained by applying an inference rule from . We will use to denote an arbitrary number of inference steps. A short proof is a proof whose length can be bounded by a polynomial on . A refutation is a proof such that . Refutations are important because they proof unsatisfiability.

A proof system is sound if all its rules are sound. All the SAT rules and proof systems considered in this paper are sound. A proof system is complete if for every such that , there is a proof with . Although completeness is a natural and elegant property, it has limited practical interest. For that reason a weaker version of completeness have been defined. A proof system is refutationally complete if for every unsatisfiable formula there is a refutation starting in (i.e, completeness is required only for refutations). It is usually believed that refutational completeness is enough for practical purposes. The reason is that if and only if is unsatisfiable, so any implicationally complete proof system can prove the entailment by deriving a refutation from a CNF formula equivalent to .

It is well-known that the proof system made exclusively of resolution is refutationally complete and adding the split rule makes the system complete. The following property tells that adding the split rule does not give any advantage to resolution in terms of refutational power,

Property 1

[(see Lemma 7 in [2]] A proof system with resolution and split as inference rules cannot make shorter refutations than a proof system with only resolution.

3 MaxSAT Preliminaries

A weight is a positive number or (i.e, ). We extend sum and substraction to weights defining and for all . Note that is only defined when .

A weighted clause is a pair where is a clause and is a weight associated to its falsification. If we say that the clause is hard, else it is soft. A weighted MaxSAT CNF formula is a set of weighted clauses . If all the clauses are hard, we say that the formula is hard. We say that if for all there is a with .

Given a formula , we define the cost of a truth assignment , noted , as the sum of weights over the clauses that are falsified by . The MaxSAT problem is the minimum cost over the set of all truth assignments,

This definition of MaxSAT including weights and hard clauses is sometimes referred to as Partial Weighted MaxSAT [11]. Note that any clause can be broken into two clauses as long as . In the following we will assume that clauses are separated and merged as needed.

We say that formula entails formula , noted , if is a lower bound of . That is, if for all , . We say that two formulas and are equivalent, noted , they entail each other. That is, if forall , .

In the following Sections we will find useful to deal with negated clauses. Hence, the corresponding definitions and useful properties. Let and be arbitrary disjunctions of literals. Let mean that falsifying incurs a cost of . Although is not a clause, the following property shows that it can be efficiently transformed into a CNF equivalent,

Property 2

.

Observe that if we restrict the MaxSAT language to hard formulas we have standard CNF formulas where corresponds to and corresponds to . Note that all the previous definitions naturally instantiate to their SAT analogous.

4 Pigeon Hole Problem and Variations

We define the well-known Pigeon Hole Problem and three MaxSAT versions , and , that we will be using in the proof of our results.

In the Pigeon Hole Problem the goal is to assign pigeons to holes without any pair of pigeons sharing their hole. In the usual SAT encoding there is a boolean variable (with and ) associated to pigeon being in hole . There are two groups of clauses. For each pigeon , we have the clause,

indicating that the pigeon must be assigned to a hole. For each hole we have the set of clauses,

indicating that the hole is occupied by at most one pigeon. Let be the union of all these sets of clauses It is obvious that is an unsatisfiable CNF formula. In MaxSAT notation the pigeon hole problem is,

and clearly .

In the soft Pigeon Hole Problem the goal is to find the assignment that falsifies the minimum number of clauses. In MaxSAT language it is encoded as,

and it is obvious that .

The problem is like the soft pigeon hole problem but augmented with one more clause where is the number of holes. Note that .

Finally, the problem is like the soft pigeon hole problem but augmented with a set of unit clauses . Note that .

5 MaxSAT Proof Systems

A MaxSAT inference rule is given by a set of antecedent clauses and a set of consequent clause. In MaxSAT, the application of an inference rule is to replace the antecedents by the consequents. The process of applying an inference rule to a formula is also noted . The rule is sound if it preserves the equivalence of the formula.

As in the SAT case, given a proof system (namely, a set of rules) a proof of length is a sequence where is the original formula and each is obtained by applying an inference rule from . If , we say that the proof is a proof of from .

A proof system is sound if all its rules are sound. In this paper all MaxSAT rules and proof systems are sound. A proof system is complete if for every , such that , there is a proof of from . A refutation of is a proof of from with . A proof system is refutationally complete if it can derive a refutation of every formula .

Next we show that, similarly to what happens in SAT, refutationally completeness is sufficient for practical purposes. The reason is that it can also be used to proof or disproof general entailment, making completeness somehow redundant. We need first to define the maximum soft cost of a formula as and the negation of a MaxSAT formula as the negation of all its clauses . The following property tells the effect of negating a formula without hard clauses,

Property 3

If is a CNF MaxSAT formula without hard clauses, then

Proof

Let be a truth assignment, be the set of clauses satisfied by and be the set of clauses falsified by . It is clear that while . Since and , then . Therefore, and, as a consequence, .

We can now show that an entailment can be rephrased as a MaxSAT problem,

Theorem 5.1

Let and be two MaxSAT formulas, possibly with hard clauses. Then,

where is a softened version of in which infinity weights are replaced by .

Proof

Let us proof the if direction. means that . Also, by construction . Therefore, . Because does not contain hard clauses, , which means that, Adding to both sides of the disequality we get, . By Property 3, we have, which clearly means that, .

Let us proof the else if direction. implies that . Moreover, since does not have hard clauses, from Property 3 we know that, so we have, and we need to have, . We reason on cases for truth assignment :

  1. If , by definition of , . Therefore, , which proofs this case.

  2. If , by definition of , . We show that in this case, .

    • if then . We show that implies that . We proceed by contradiction. Let us suppose that and . The latter means that satisfies all hard clauses. As a consequence, , which contradicts the hypothesis.

    • if , then there are no such that . By definition of , forall , . Therefore, either satisfies all hard clauses in and then or falsifies at least one hard clause in and then .

which proofs the theorem.

The application of the previous theorem to single clause entailment yields the following corollary,

Corollary 1

Let be a formula and be a weighted clause. Then,

where

A useful application of this corollary will be shown in Section 5.3.

In the rest of the section we introduce and analyze the incremental impact of the three inference rules.

5.1 Resolution

The MaxSAT resolution rule [8] is,

where and are arbitrary (possibly empty) disjunctions of literals and . When (resp. ) is empty, (resp. ) is constant true, so (resp. ) is tautological. Note that MaxSAT resolution, when applied to two hard clauses, corresponds to SAT resolution.

It is known that the proof system Res made exclusively of the resolution rule is refutationally complete,

Theorem 5.2

[6, 9] Res is refutationally complete.

However, as we show next, it is not complete.

Theorem 5.3

Res is not complete.

Proof

Consider formula . It is clear that which cannot be derived with Res.

It is well-known that Res cannot compute short refutations for PHP [12] or SPHP [6]. However, it can efficiently refute . We write it as a property and sketch the proof (which is a direct adaptation of what was proved in [7] and [10]) because it will be instrumental in the proof of several results in the rest of this section,

Property 4

There is a short Res refutation of .

Proof

The proof is based on the fact that for each one of the pigeons there is a short refutation

and for each one of the holes there is a short refutation

Because each derivation is independent of the other we can concatenate them into,

which is a refutation of .

5.2 Split

The split rule,

is the natural extension of its SAT counterpart. Consider the proof system ResS, made of resolution and split. We show that, as it happens in the SAT case, the split rule brings completeness,

Theorem 5.4

ResS is complete.

Proof

The proof is based on the following facts:

  1. For every formula there is a proof where is made exclusively of splits in which all the clauses of contain all the variables in the formula and there are no repeated clauses. Each clause can be expanded to a new variable not in using the split rule. This process can be repeated until all clauses in the current formula contain all the variables in the formula. Note that all clauses , can be merged and, as a result, does not contain repeated clauses.

  2. If then . Let be the proof from to . Then, is done resolving the pairs of clauses in that were splitted in the step.

  3. If then there exists a unique clause which is falsified by

By fact (1), and . Because of soundness, , and . Since , . Therefore, which, by fact (3), means that for each there exists a unique and which is falsified by . Separating all into , we have . Therefore, . By fact (2), .

However, unlike what happens in the SAT case (see Property 1), ResS is stronger than Res,

Theorem 5.5

ResS is stronger than Res.

Proof

On the one hand, it is clear that ResS can simulate any proof of Res since it contains a superset of Res rules. On the other hand, unlike Res, ResS can produce short refutations for , as shown below.

First, let us proof that Res cannot produce short refutations for . Since the resolution rule does not apply to the empty clause , if Res could refute in polynomial time it would also refute in polynomial time, which is impossible [6].

ResS can produce short refutations for because it can transform into and then apply Property 4. The transformation is done by a sequence of splits,

that move one unit of weight from the empty clause to every variable in the formula and its negation.

5.3 Virtual

In a recent paper [10] we proposed a proof system in which clauses with negative weights can appear during the proof. This is equivalent to adding to ResS the virtual rule,

which allows to introduce a fresh clause into the formula. To preserve soundness (i.e, cancel out the effect of the addition) it also adds .

Let ResSV be the proof system made of resolution, split and virtual (note that resolution and split are only defined for antecedents with positive weights). It has been shown that if is a ResSV proof and does not contain any negative weight, then for every we have that .

The following theorem shows that the virtual rule adds further strength to the proof system,

Theorem 5.6

ResSV is stronger than ResS.

Proof

On the one hand, it is clear that ResSV can simulate any proof of ResS since it contains a superset of the ResS rules. On the other hand, ResSV can produce a short refutation of and ResS cannot.

The short refutation of ResSV, as shown in [10], is obtained by first virtually transforming into . Then, it uses Property 4 to derive . Finally, it splits one unit of the empty clause cost to each pair to cancel out negative weights. At the end of the process all clauses have positive weight while still having .

It is clear that ResS cannot polynomially refute because otherwise a SAT proof system with resolution and split rules would produce shorter refutations than a SAT proof system with only resolution, which contradicts Property 1.

We will finish this section showing that Theorem 5.1 has an unexpected application in the context of ResSV. Consider the problem of proving . This can be done with a refutation of . Namely or using Corollary 1, which tells that if and only if . The following two theorems shows that ResSV cannot do efficiently the first approach, but can do efficiently the second.

Theorem 5.7

There is no short ResSV refutation of .

Proof

Virtual rule cannot introduce hard clauses and resolution and split rules only produce a hard consequence if they have hard antecedents. As a consequence, can only be obtained by resolving or splitting hard clauses in . If ResSV produce a short refutation for , ResS and, as a consequence Res, also produce the same short refutation for , which contradicts Property 1.

Theorem 5.8

There is a polynomial ResSV proof of from .

Proof

We only need to apply the virtual rule,

and then split,

for each . The resulting problem is similar to but with hard clauses. At this point and adapting the proof of 4 we can derive cancel out the negative weight while still retaining .

6 MaxSAT Circular Proofs

In this section we study the relation between ResSV and the recently proposed concept of circular proofs [3]. Circular proofs allows the addition of an arbitrary set of clauses to the original formula. It can be seen that conclusions are sound as long as the added clauses are re-derived as many times as they are used. In the original paper this condition is characterized as the existence of a flow in a graphical representation of the proof. Here we show that the ResSV proof system naturally captures the same idea and extends it from SAT to MaxSAT with an arguably simpler notation. In particular, the virtual rule guarantees the existence of the flow.

6.1 SAT Circular Proofs

We start reviewing the SAT case, as defined in [3]. Given a CNF formula a circular pre-proof of from is a sequence,

such that , is an arbitrary set of clauses and each () is obtained from previous clauses by applying an inference rule in the proof system. Note that the same clause can be both derived and used several times during the proof.

A circular pre-proof can be associated with a directed bi-partite graph such that there is one node in for each element of the sequence (called clause nodes) and one node in for each inference step (called inference nodes). There is an arc from to if is an antecedent clause in the inference step of . There is an arc from to if is a consequent clause in the inference step of . The graph is compacted by merging nodes whose associated clause is identical to one in . Note that before the compactation the graph is acyclic, but the compactation may introduce cycles. The set of in-neighbors and out-neighbors of node are denoted and , respectively.

A flow assignment for a circular pre-proof is an assignment of positive reals to inference nodes. The balance of node is the inflow minus the outflow,

Definition 1

A SAT circular proof of clause from CNF formula is a pre-proof whose proof-graph admits a flow in which all clauses not in have non-negative balance and has a strictly positive balance.

Theorem 6.1 (soundness)

Assuming a sound SAT proof system, if there is a SAT circular proof of from formula then .

Property 5

Using the proof system with the following two rules,

there is a short circular refutation of .

6.2 ResSV and MaxSAT circular proofs

Now we show that the MaxSAT ResSV proof system is a true extension of circular proofs from SAT to MaxSAT. The following two theorems show that, when restricted to hard formulas, ResSV and SAT circular proofs can simulate each other. Recall that specializing Corollary 1 to hard formulas and is equivalent. Therefore, one can show with a proof .

Theorem 6.2

Let be a SAT circular proof of clause from formula using the proof system symmetric resolution and split. There is a MaxSAT ResSV proof of from . The length of the proof is .

Proof

Let be the proof graph and and the flow of . By definition of SAT circular proof, and .

The ResSV proof starts with and consists in 3 phases. In the first phase, the virtual rule is applied for each not in , introducing with . In the second phase, there is an inference step for each . If is a SAT split, the inference step is a MaxSAT split generating two clauses with weight . If is a SAT symmetric resolution, the inference step is a MaxSAT resolution generating one clause with weight . Note that this phase never creates new clauses because all of them have been virtually added at the first phase. It only moves weights around the existing ones. Note as well that we guarantee by construction that at each step of the proof the antecedents are available no matter in which order the proof is done because the first phase has given enough weight to each added clause to guarantee it and original clauses are hard, so their weight never decreases. At the end of the second phase we have with with being the balance of . Therefore is in . The third phase is a final sequence of steps in which is derived from which completes the proof. Note that the size of the proof is .

Theorem 6.3

Consider a hard formula and a ResSV proof with inference steps. There is a SAT circular proof of from with proof system symmetric resolution and split. Besides,

Proof

We need to build a graph with and , and a flow that satisfies the balance conditions and with which has strictly positive balance.

Because the virtual rule does not have antecedents all its applications can be done at the beginning of the proof and all the cancellation of all the virtual clauses can be done at the end. Therefore, we can omit all those inference steps and assume without loss of generality that the proof is a ResS (that is, without virtual),

where is the set of virtually added clauses. Note as well that any application of MaxSAT resolution between and can be simulated by a short sequence of splits to both clauses until their scope is the same and then one resolution step between and . So, again without loss of generality we can assume that the proof only contains splits and symmetric resolutions.

Our proof contains three phases. First, we are going to build an acyclic graph which is an unfolded version of and a flow function that may have flows. Second we will compute traversing the graph bottom-up and replacing any infinite flow in by a finite one that still guarantees the flow condition. In the third and final phase, we will compact the graph which will constitute the circular proof.

Phase 1:

We build by following the proof step by step. Let be the graph associated to proof step . We define the front of as the set of clause nodes in with strictly positive balance. By construction of we will guarantee a connection between the current formula and the front of the current graph

where we define .

contains one clause nodes for each clause in , and , respectively. For each clause node there is one dummy inference node pointing to it. The flow of the inference node is the weight of the clause it points to. This set of dummy inference nodes will be removed at step three. Then we proceed through the proof. At inference step , we add a new inference node to . Its in-neighbors will be nodes from the front (that must exist because of the invariant) and its out-neighbors will be newly added clause nodes. Its flow is the weight moved by the inference rule (which may be infinite). If the inference rule is split we add two clause nodes, one for each consequent and add the corresponding arcs. If the inference rule is a resolution we add one clause node for its consequent and add the corresponding arcs. Note that, the out-neighbors of node have a positive balance and in-neighbors of have their out-flow decreased, but cannot turn negative. Finally, we merge any pair of nodes in the front of whose associated clause is the same (which preserves the property of balances being non-negative). Graph is obtained after processing the last inference step. Note that the invariant guarantees that is in and its balance is .

Phase 2:

Now we traverse the inference nodes of in the reverse order of how they were added transforming infinite flows into finite. When considering node , because of the traversing order, we know that every has finite out-going flow. We compute flow value as follows: if is finite, then , else is the minimum value that guarantees that the balance of every is non-negative.

Phase 3:

We obtain by doing some final arrangements to . First, we remove dummy inference nodes pointing to clauses in , and added in Phase 1. As a result, the balance of these nodes is negative. In particular, the balance of nodes representing and is its negative weight.

Since , we know that all nodes representing are included in the front of with balance greater than or equal to its weight. We compact these nodes with the ones in and, as a result, its balance is positive.

Finally, we add some split nodes with flow from node (recall that ) in order to generate and , and we compact the latter ones with the ones in . As a result, the balance of is and the balance of nodes is positive.

7 Conclusions

This paper constitutes a first attempt towards MaxSAT resolution-based proof complexity analysis. We have provided some basic definitions and results emphasizing the similarities and differences with respect to SAT. In particular, we have shown that MaxSAT entailment can be rephrased as a MaxSAT refutation problem and, as a consequence, refutation completeness is sufficient for practical purposes. Interestingly, when such rephrasing is applied to hard formulas it transforms a SAT query into a MaxSAT one, and such transformation turns out to be relevant in our analysis of SAT circular proofs.

We have also provided three basic inference MaxSAT rules used in resolution-based proof systems (e.g. resolution, split and virtual) and have analysed their incremental effect in terms of refutation power. Finally, we have related ResSV, the strongest of the proof systems considered, with the recently proposed concept of circular proofs. We have shown that ResSV generalizes SAT circular proofs as defined in [3].

An additional contribution of the paper is to put together under a formal framework and common notation some ideas spread around in different recent papers such as [7, 10, 3].

References