# The Opacity of Backbones and Backdoors Under a Weak Assumption

Backdoors and backbones of Boolean formulas are hidden structural properties. A natural goal, already in part realized, is that solver algorithms seek to obtain substantially better performance by exploiting these structures. However, the present paper is not intended to improve the performance of SAT solvers, but rather is a cautionary paper. In particular, the theme of this paper is that there is a potential chasm between the existence of such structures in the Boolean formula and being able to effectively exploit them. This does not mean that these structures are not useful to solvers. It does mean that one must be very careful not to assume that it is computationally easy to go from the existence of a structure to being able to get one's hands on it and/or being able to exploit the structure. For example, in this paper we show that, under the assumption that P ≠ NP, there are easily recognizable sets of Boolean formulas for which it is hard to determine whether they have a large backbone. We also show that, also under the assumption P ≠ NP, there are easily recognizable families of Boolean formulas with strong backdoors that are easy to find, yet for which it is hard to determine whether they are satisfiable.

## Authors

• 12 publications
• 7 publications
• ### The Opacity of Backbones

A backbone of a boolean formula F is a collection S of its variables for...
06/11/2016 ∙ by Lane A. Hemaspaandra, et al. ∙ 0

• ### Community-based 3-SAT Formulas with a Predefined Solution

It is crucial to generate crafted SAT formulas with predefined solutions...
02/26/2019 ∙ by Yamin Hu, et al. ∙ 0

• ### QFUN: Towards Machine Learning in QBF

This paper reports on the QBF solver QFUN that has won the non-CNF track...
10/05/2017 ∙ by Mikoláš Janota, et al. ∙ 0

• ### Exploring the Use of Shatter for AllSAT Through Ramsey-Type Problems

In the context of SAT solvers, Shatter is a popular tool for symmetry br...
11/17/2017 ∙ by David E. Narváez, et al. ∙ 0

• ### Generalizing Boolean Satisfiability I: Background and Survey of Existing Work

This is the first of three planned papers describing ZAP, a satisfiabili...
06/30/2011 ∙ by H. E. Dixon, et al. ∙ 0

• ### Generalizing Boolean Satisfiability II: Theory

This is the second of three planned papers describing ZAP, a satisfiabil...
09/09/2011 ∙ by H. E. Dixon, et al. ∙ 0

• ### Generalizing Boolean Satisfiability III: Implementation

This is the third of three papers describing ZAP, a satisfiability engin...
09/09/2011 ∙ by H. E. Dixon, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many algorithms for the Boolean satisfiability problem exploit hidden structural properties of formulas in order to find a satisfying assignment or prove that no such assignment exists. These structural properties are called hidden because they are not explicit in the input formula. A natural question that arises then is what is the computational complexity associated with these hidden structures. In this paper we focus on two hidden structures: backbones and strong backdoors [WGS03].

The complexity of decision problems associated with backdoors and backbones has been studied by Kilby, Slaney, Thiébaux, and Walsh [KSTW05] and Dilkina, Gomes, and Sabharwal [DGS14], among others.

In the present paper, we show that, under the assumption that , there are easily recognizable families of formulas with strong backdoors that are easy to find, yet the problem of determining whether these formulas are satisfiable remains hard (in fact, NP-complete).

Hemaspaandra and Narváez [HN17] showed, under the (rather strong) assumption that , a separation between the complexity of finding backbones and that of finding the values to which the backbone variables must be set. In the present paper, we also add to that line of research by showing that, under the assumption that , there are families of formulas that are easy to recognize (i.e., they can be recognized by polynomial-time algorithms) yet no polynomial-time algorithm can, given a formula from the family, decide whether the formula has a large backbone (doing so is NP-complete).

Far from being a paper that is intended to speed up SAT solvers, this is a paper trying to get a better sense of the (potential lack of) connection between properties existing and being able to get one’s hands on the variables or variable settings that are the ones expressing the property’s existence. That is, the paper’s point is that there is a potential gap between on one hand the existence of small backdoors and large backbones, and on the other hand using those to find satisfying assignments. Indeed, the paper establishes not just that (if

) such gaps exist, but even rigorously proves that if any NP set exists that is frequently hard (with respect to polynomial-time heuristics), then sets of our sort exist that are essentially just as frequently hard; we in effect prove an inheritance of frequency-of-hardness result, under which our sets are guaranteed to be essentially as frequently hard as any set in NP is.

Our results admittedly are theoretical results, but they speak both to the importance of not viewing backdoors or backbones as magically transparent—we prove that they are in some cases rather opaque—and to the fact that the behavior we mention likely happens on quite dense sets; and, further, since we tie this to whether any set is densely hard, these SAT-solver issues due to this paper have now become inextricably linked to the extremely important, long-open question of how resistant to polynomial-time heuristics the hardest sets in NP can be.111We mention in passing that there are relativized worlds (aka black-box models) in which NP sets exist for which all polynomial-time heuristics are asymptotically wrong half the time [HZ96]

; heuristics basically do no better than one would do by flipping a coin to give one’s answer. Indeed, that is known to hold with probability one relative to a random oracle, i.e., it holds in all but a measure zero set of possible worlds

[HZ96]. Although many suspect that the same holds in the real world, proving that would separate from in an extraordinarily strong way, and currently even proving that and differ is viewed as likely being decades (or worse) away [Gas12]. We are claiming that these important hidden properties—backdoors and backbones—have some rather challenging behaviors that one must at least be aware of. Indeed, what is most interesting about this paper is likely not the theoretical constructions themselves, but rather the behaviors that those constructions prove must exist unless . We feel that knowing that those behaviors cannot be avoided unless is of potential interest to both AI and theory. Additionally, the behavior in one of our results is closely connected to the deterministic time complexity of SAT; in our result (Theorem 3.5) about easy-to-find hard-to-assign-values-to backdoors, we show that the backdoor size bound in our theorem cannot be improved even slightly unless NP is contained in subexponential time.

The rest of this paper is organized as follows. Section 2 defines the notation we will use throughout this paper. Sections 3 and 4 contain our results related to backdoors and backbones, respectively. Finally, Section 5 adds some concluding remarks.

## 2 Definitions and Notations

For a Boolean formula , we denote by the set of variables appearing in .

Adopting the notations of Williams, Gomes, and Selman [WGS03], we use the following. A partial assignment of is a function that assigns Boolean values to the variables in a set . For a Boolean value and a variable , the notation denotes the formula after replacing every occurrence of by and simplifying. This extends to partial assignments, e.g., to , in the natural way.

For a finite set , denotes ’s cardinality. For any string , denotes the length of (number of characters of) .

For each set and each natural number , denotes the set of all strings in whose length is less than or equal to . In particular, denotes the strings of length at most , over the alphabet .

## 3 Results on Backdoors to CNF Formulas

In this section we focus on Boolean formulas in conjunctive normal form, or CNF. A CNF formula is a conjunction of disjunctions, and the disjunctions are called the clauses of the formula. Following Dilkina, Gomes, and Sabharwal [DGS14], we define satisfiability of CNF formulas using the language of set theory. This is done by formalizing the intuition that, in order for an assignment to satisfy a CNF formula, it must set at least one literal in every clause to True. One can then define a CNF formula to be a collection of clauses, each clause being a set of literals. if and only if there exists an assignment such that for all clauses there exists a literal such that assigns to True. Under this formalization, to be in harmony with the standard conventions that the truth value of the empty conjunctive (resp., disjunctive) formula is True (resp., False), must be taken to be in if is empty, and must be taken to be in if (since the empty CNF formula must be taken to be False as a consequence of the fact that the empty disjunctive formula is taken to be False); these two cases are called, respectively, being trivially True and being trivially False (as the conventions as just mentioned put these cases not just in and but fix the truth values of the represented formulas to be True and False). We can also formalize simplification using this notation: after assigning a variable to True (resp., False), the formula is simplified by removing all clauses that contain the literal (resp., ) and removing the literal (resp., ) from the remaining clauses. This formalization extends to simplification of a formula over a partial assignment in the natural way.

###### Example 3.1.

Consider the CNF formula . We can express this formula in our set theory notation as . Suppose we assign to False and to True, we have , which is unsatisfiable because it contains the empty set.

Since CNFSAT (the satisfiability problem restricted to CNF formulas) is well-known to be -complete, a polynomial-time algorithm to determine the satisfiability of CNF formulas is unlikely to exist. Nevertheless, there are several restrictions of CNF formulas for which satisfiability can be decided in polynomial time. When a formula does not belong to any of these restrictions, it may have a set of variables that, once the formula is simplified over a partial assignment of these variables, the resulting formula belongs to one of these tractable restrictions. A formalization of this idea is the concept of backdoors.

###### Definition 3.2 (Subsolver [Wgs03]).

A polynomial-time algorithm is a subsolver if, for each input formula , satisfies the following conditions.

1. either rejects the input (this indicates that it declines to make a statement as to whether is satisfiable) or determines (i.e., returns a satisfying assignment if is satisfiable and proclaims ’s unsatisfiability if is unsatisfiable).

2. If is trivially True determines , and if is trivially False determines .

3. If determines , then for each variable and each value , determines .

###### Definition 3.3 (Strong Backdoor [Wgs03]).

For a Boolean formula , a nonempty subset of its variables is a strong backdoor for a subsolver if, for all partial assignments , determines (i.e., if is satisfiable returns a satisfying assignment and if is unsatisfiable proclaims its unsatisfiability).

Many examples of subsolvers can be found in the literature (for instance, in Table 1 of [DGS14]). The subsolver that is of particular relevance to this paper is the unit propagation subsolver, which focuses on unit clauses. Unit clauses are clauses with just one literal. They play an important role in the process of finding models (i.e., satisfying assignments) because the literal in that clause must be set to True in order to find a satisfying assignment. The process of finding a model by searching for a unit clause (for specificity and to ensure that it runs in polynomial time, let us say that our unit propagation subsolver always focuses on the unit clause in the current formula whose encoding is the lexicographically least among the encodings of all unit clauses in the current formula), fixing the value of the variable in the unit clause, and simplifying the formula resulting from that assignment is known in the satisfiability literature as unit propagation. Unit propagation is an important building block in the seminal DPLL algorithm for  [DP60, DLL62]. Notice that the CNF formulas whose satisfiability can be decided by just applying unit propagation iteratively constitute a tractable restriction of . The unit propagation subsolver attempts to decide the satisfiability of an input formula by using only unit propagation and empty clause detection. If satisfiability cannot be decided this way, the subsolver rejects the input formula. Szeider [Sze05]

has classified the parameterized complexity of finding backdoors with respect to the unit propagation subsolver.

###### Example 3.4.

Consider the formula from Example 3.1. We will show that is a strong backdoor of with respect to the unit propagation subsolver by analyzing the possible assignments of these variables. Suppose is assigned to True and notice . From there it is easy to see that if is set to True, the resulting formula after simplification is trivially satisfiable. If is set to False, assigning to True yields the formula after simplification and the satisfiability of this formula can be determined by the unit propagation subsolver. Assigning to False yields a formula with two unit clauses, . The unit propagation subsolver will pick the unit clause ,222Here we assume that a clause precedes a clause in lexicographical order if precedes in lexicographical order. assign the truth value of and simplify, and will then pick the (sole) remaining unit clause, , and assign the truth value of and simplify to obtain a trivially satisfiable formula. Now suppose is assigned to False and notice . If we now assign to True, notice . If we assign to True simplifies to a trivially satisfiable formula. If we assign to False, the formula simplifies to . The unit propagation subsolver will pick the unit clause , assign the truth value of , and the resulting formula after simplification will be whose satisfiability can be determined by the unit propagation subsolver. If we assign to False, notice . If we now assign to True and simplify, the resulting formula would be whose satisfiability can be determined by the unit propagation subsolver. If we assign to False and simplify, the resulting formula would contain the unit clause . The unit propagation subsolver would then set the value of to False and simplify, yielding the formula , whose satisfiability can also be determined by the unit propagation subsolver.

It should be clear from the case analysis above that just setting the values of and is not enough for the unit propagation subsolver to always be able to determine the satisfiability of the resulting formula. In fact, a similar analysis done on every 2-element subset and every 3-element subset of —which we do not write out here—shows that is actually the smallest strong backdoor of with respect to the unit propagation subsolver.

We are now ready to prove our main result about backdoors: Under the assumption that , there are families of Boolean formulas that are easy to recognize and have strong unit propagation backdoors that are easy to find, yet deciding whether the formulas in these families are satisfiable remains -complete.

###### Theorem 3.5.

If , for each there is a set of Boolean formulas such that all the following hold.

1. and is -complete.

2. Each formula in has a strong backdoor with respect to the unit propagation subsolver, with .

3. There is a polynomial-time algorithm that, given , finds a strong backdoor having the property stated in item 2 of this theorem.

###### Proof.

For the theorem is trivial, so we henceforward consider just the case where . Consider (since in the following set definition is specified as being in CNF, we can safely start the following with “” rather than for example “”)

 A={F∧(new1∧⋯∧new∥V(F)∥k−∥V(F)∥)∣F is a CNF formula},

where is the th (in lexicographical order) legal variable name that does not appear in . For instance, if contains literals , , , and , and if our legal variable universe is , then would be . The backdoor is the set of variables of , which can be found in polynomial time by parsing. It is clear that the formula resulting from simplification after assigning values to all the variables of only has unit clauses and potentially an empty clause, so satisfiability for this formula can be decided by the unit propagation subsolver. Finally, it is easy to see that so under the assumption that , deciding satisfiability for the formulas in is hard. ∎

Let us address two natural worries the reader might have regarding Theorem 3.5. First, the reader might worry that the hardness spoken of in the theorem occurs very infrequently (e.g., perhaps except for just one string at every double-exponentially spaced length everything is easy). That is, are we giving a worst-case result that deceptively hides a low typical-case complexity? We are not (unless all of NP has easy typical-case complexity): we show that if any set in NP is frequently hard with respect to polynomial-time heuristics, then a set of our sort is almost as frequently hard with respect to polynomial-time heuristics. We will show this as Theorem 3.7.

But first let us address a different worry. Perhaps some readers will feel that the fact that Theorem 3.5 speaks of backdoors of size bounded by a fixed th root in size is a weakness, and that it is disappointing that the theorem does not establish its same result for a stronger bound such as “constant-sized backdoors,” or if not that then polylogarithmic-sized, or if not that then at least ensuring that not just each fixed root is handled in a separate construction/set but that a single construction/set should assert/achieve the case of a growth rate that is asymptotically less than every root. Those are all fair and natural to wonder about. However, we claim that not one of those improvements of Theorem 3.5 can be proven without revolutionizing the deterministic speed of SAT. In particular, the following result holds, showing that those three cases would respectively put NP into P, quasipolynomial time, and subexponential time.

###### Theorem 3.6.
1. [Constant case] Suppose there is a and a set of Boolean formulas such that all the following hold: (a)  and is -complete; (b) each formula in has a strong backdoor with respect to the unit propagation subsolver, with ; and (c) there is a polynomial-time algorithm that, given , finds a strong backdoor having the property stated in item (b) of this theorem. Then

 P=NP.
2. [Polylogarithmic case] Suppose there is a function , with , and a set of Boolean formulas such that all the following hold: (a)  and is -complete; (b) each formula in has a strong backdoor with respect to the unit propagation subsolver, with ; and (c) there is a polynomial-time algorithm that, given , finds a strong backdoor having the property stated in item (b) of this theorem. Then is in quasipolynomial time, i.e.,

 NP⊆⋂c>0DTIME[2(logn)c].
3. [Subpolynomial case] Suppose there is a polynomial-time computable function and a set of Boolean formulas such that all the following hold: (a) for each , ; (b)  and is -complete; (c) each formula in has a strong backdoor with respect to the unit propagation subsolver, with ; and (d) there is a polynomial-time algorithm that, given , finds a strong backdoor having the property stated in item (c) of this theorem. Then is in subexponential time, i.e.,

 NP⊆⋂ϵ>0DTIME[2nϵ].

We can see this as follows. Consider the “Constant case”—the first part—of the above theorem. Let be the constant of that part. Then there are at most ways of choosing of the variables of a given Boolean formula of bits. And for each of those ways, we can try all possible ways of setting those variables. This is items to test—a polynomial number of items. If the formula is satisfiable, then via unit propagation one of these must yield a satisfying assignment (in polynomial time). Yet the set was NP-complete by the first condition of the theorem. So we have that , since we just gave a polynomial-time algorithm for . The other three cases are analogous (except in the final case, we in the theorem needed to put in the indicated polynomial-time constraint on the bounding function since otherwise it could be badly behaved; that issue doesn’t affect the second part of the theorem since even a badly behaved function of the second part is bounded above by a simple-to-compute function satisfying and we can use in place of in the proof).

Even the final part of the above theorem, which is the part that has the weakest hypothesis, implies that NP is in subexponential time. However, it is widely suspected that the NP-complete sets lack subexponential-time algorithms. And so we have established that the growth, which we do prove Theorem 3.5, is the smallest bound in part 2 of that result that one can hope to prove Theorem 3.5 for without having to as a side effect put NP into a deterministic time class so small that we would have a revolutionarily fast deterministic algorithm for .

Moving on, we now, as promised above, address the frequency of hardness of the sets we define in Theorem 3.5, and show that if any set in NP is frequently hard then a set of our type is almost-as-frequently hard. (Recall that, when ’s universe is the naturals as it is in the following theorem, “for almost every ” means “for all but at most a finite number of natural numbers .”) We will say that a (decision) algorithm errs with respect to on an input if the algorithm disagrees with on , i.e., if the algorithm accepts yet or the algorithm rejects yet .

###### Theorem 3.7.

If is any nondecreasing function and for some set it holds that each polynomial-time algorithm errs with respect to , at infinitely many lengths (resp., for almost every length ), on at least of the inputs up to that length, then there will exist an and a set of Boolean formulas satisfying the conditions of Theorem 3.5, yet being such that each polynomial-time algorithm , at infinitely many lengths (resp., for almost every length ), will fail to determine membership in for at least inputs of length at most .

Before getting to the proof of this theorem, let us give concrete examples that give a sense about what the theorem is saying about density transference. It follows from Theorem 3.7 that if there exists even one NP set such that each polynomial-time heuristic algorithm asymptotically errs exponentially often up to each length (i.e., has errors), then there are sets of our form that in the same sense fool each polynomial-time heuristic algorithm exponentially often. As a second example, it follows from Theorem 3.7 that if there exists even one NP set such that each polynomial-time heuristic algorithm asymptotically errs quasipolynomially often up to each length (i.e., has errors), then there are sets of our form that in the same sense fool each polynomial-time heuristic algorithm quasipolynomially often. Since almost everyone suspects that some NP sets are quasipolynomially and indeed even exponentially densely hard, one must with equal strength of belief suspect that there are sets of our form that are exponentially densely hard.

###### Proof of Theorem 3.7.

For conciseness and to avoid repetition, we build this proof on top of a proof (namely, of Theorem 4.5) that we will give later in the paper. That later proof does not rely directly or indirectly on the present theorem/proof, so there is no circularity at issue here. However, readers wishing to read the present proof should probably delay doing that until after they have first read that later proof.

We define as in the proof of Theorem 4.5 (the given there draws on a construction from Appendix A of [HN16], and due to that construction’s properties outputs only conjunctive normal form formulas). For a given , we define

 A={rB(x)∧(new1∧⋯∧new∥V(rB(x))∥k−∥V(rB(x))∥)∣x∈Σ∗},

and since and , we can now proceed as in the proof of Theorem 4.5, since here too the tail’s length is polynomially bounded. ∎

## 4 Results on Backbones

For the sake of completeness, we start this section by restating the definition of backbones as presented by Williams, Gomes, and Selman [WGS03]. We restrict ourselves to the Boolean domain, since we only deal with Boolean formulas in this paper.

###### Definition 4.1 (Backbone [Wgs03]).

For a Boolean formula , a subset of its variables is a backbone if there is a unique partial assignment such that is satisfiable.

The size of a backbone is the number of variables in . One can readily see from Definition 4.1 that all satisfiable formulas have at least one backbone, namely, the empty set. This backbone is called the trivial backbone, while backbones of size at least one are called nontrivial backbones. It follows from Definition 4.1 that unsatisfiable formulas do not have backbones. Note also that some satisfiable formulas have no nontrivial backbones, e.g., is satisfiable but has no nontrivial backbone.

###### Example 4.2.

Consider the formula . Any satisfying assignment of must have set to True, which in turn constrains and . Then is a backbone of , as is any subset of this backbone. It is also easy to see that is the largest backbone of this formula since the truth values of and are not entirely constrained in (since in effect is—once one applies the just-mentioned forced assignments—).

Our first result states that if then there are families of Boolean formulas that are easy to recognize, with the property that deciding whether a formula in these families has a large backbone is NP-complete (and so is hard). As a corollary to its proof, we have that if then there are families of Boolean formulas that are easy to recognize, with the property that deciding whether a formula in these families has a nontrivial backbone is NP-complete (and so is hard).333We have not been able to find Corollary (to the Proof) 4.4 in the literature, though it would not be shocking if it either was there or was folklore. Certainly, two things that on their surface might seem to be the claim we are making in Corollary (to the Proof) 4.4 are either trivially true or are in the literature. However, upon closer inspection they turn out to be quite different from our claim. In particular, if one removes the word “nontrivial” from Corollary (to the Proof) 4.4’s statement, and one is in the model in which every satisfiable formula is considered to have the empty collection of variables as a backbone and every unsatisfiable formula is considered to have no backbones, then the thus-altered version of Corollary (to the Proof) 4.4 is clearly true, since if one with those changes takes to be the set of all Boolean formulas, then the theorem degenerates to the statement that if , then SAT is (NP-complete, and) not in . Also, it is stated in Kilby et al. [KSTW05] that finding a backbone of CNF formulas is NP-hard. However, though this might seem to be our result, their claim and model differ from ours in many ways, making this a quite different issue. First, their hardness refers to Turing reductions (and in contrast our paper is about many-one reductions and many-one completeness). Second, they are not even speaking of NP-Turing-hardness—much less NP-Turing-completeness—in the standard sense since their model is assuming a function reply from the oracle rather than having a set as the oracle. Third, even their notion of backbones is quite different as it (unlike the influential Williams, Gomes, and Selman 2003 paper [WGS03] and our paper) in effect requires that the function-oracle gives back both a variable and its setting. Fourth, our claim is about nontrivial backbones.

###### Theorem 4.3.

If then for any real number , there is a set of Boolean formulas such that the language

 LA={F∣F∈A and F has a backbone S with ∥S∥≥β∥V(F)∥}

is -complete (and so, since is our hypothesis, is not in ).

###### Corollary (to the Proof) 4.4.

If then there is a set of Boolean formulas such that the language

 LA={F∣F∈A and F has a nontrivial backbone S}

is -complete (and so, since is our hypothesis, is not in ).

###### Proof of Theorem 4.3 and Corollary 4.4.

We will first prove Theorem 4.3, and then will note that Corollary 4.4 follows easily as a corollary to the proof/construction.

So fix a from Theorem 4.3’s statement. For each Boolean formula , let

 q(G)=⌈β∥V(G)∥1−β⌉.

Define

 A={(G)∧(new1∧new2∧⋯∧newq(G))∣ G is a Boolean formula having at least one variable},

where, as in the proof of Theorem 3.5, we define as the th variable that does not appear in . Note that is a backbone if and only if , thus under the assumption that and keeping in mind that for zero-variable formulas satisfiability is easy to decide, it follows that no polynomial-time algorithm can decide , since the size of this backbone is , which by our definition of will satisfy the condition . Why does it satisfy that condition? here is . And here, since is the formula , equals . So the condition is claiming that , i.e., that , which indeed holds in light of the definition of . And why do we claim that no polynomial-time algorithm can decide ? Well, note that many-one polynomial-time reduces to via the reduction that equals some fixed string in if is in and has zero variables and that equals some fixed string in if is not in and has zero variables (these two cases are included merely to handle degenerate things such as that can occur if we allow True and False as atoms in our propositional formulas), and that equals

 (H)∧(new1∧new2∧⋯∧newq(H))

otherwise (the above formula is conjoined with a large number of new variables). Since is in ,444If one just looks at the definition of , one might worry that might have only as an obvious upper bound. However, as noted above our particular choice of ensures that is a backbone of if and only if ; and that makes clear that our set is indeed in . we have that it is -complete, and since was part of the theorem’s hypothesis, cannot be in .

The above proof establishes Theorem 4.3. Corollary 4.4 follows immediately from the proof/construction of Theorem 4.3. Why? The set from the proof of Theorem 4.3 is constructed in such a way that each of its potential members (where is a Boolean formula having at least one variable) either has no nontrivial backbone (indeed, no backbone) or has a backbone of size at least . Thus the issue of backbones that are nontrivial but smaller than , where is , does not cause a problem under the construction. That is, our (which itself is dependent on the value of one is interested in) is such that we have ensured that and has a nontrivial backbone and has a backbone with . ∎

We now address the potential concern that the hard instances for the decision problems we just introduced may be so infrequent that the relevance of Theorem 4.3 and Corollary 4.4 is undercut. The following theorem argues against that possibility by proving that, unless not a single set is frequently hard (in the sense made rigorous in the theorem’s statement), there exists sets of our form that are frequently hard. (This result is making for backbones a point analogous to the one our Theorem 3.7 makes for backdoors. Hemaspaandra and Narváez [HN17] looks at frequency of hardness result for backbones, but with results focused on rather than .)

###### Theorem 4.5.

If is any nondecreasing function and for some set it holds that each polynomial-time algorithm errs with respect to , at infinitely many lengths (resp., for almost every length ), on at least of the inputs up to that length, then there will exist an and a set of Boolean formulas satisfying the conditions of Theorem 4.3, yet being such that each polynomial-time algorithm , at infinitely many lengths (resp., for almost every length ), will fail to correctly determine membership in for at least inputs of length at most .

The same claim also holds for Corollary 4.4.

###### Proof.

We will prove the theorem’s statement regarding Theorem 4.3. It is not hard to also then see that the analogous claim holds regarding Corollary 4.4.

and SAT is NP-complete. So let be a polynomial-time function, transforming strings into Boolean formulas, such that (a) , and (b)  is one-to-one. (A construction of such a function is given in Appendix A of [HN16], and let us assume that that construction is used.) As in the proof of Theorem 4.3, if is a Boolean formula we define .

Without loss of generality, we assume that outputs only formulas having at least one variable. Note that throughout this proof, is called just on outputs of . Thus we have ensured that none of the logarithms in this proof have a zero as their argument.

Set

 A={(rB(x))∧(new1∧new2∧⋯∧newq(rB(x)))∣x∈Σ∗}.

Because is computable in polynomial time, there is a polynomial such that for every input of length at most , the length of is at most . Fix some such polynomial , and let denote its degree. In order to find a bound for the length of the added “tail” in terms of , notice that the length of the tail is less than some constant (that holds over all and , ) times . Since and the length of is at least a constant times the number of its variables, our assumption that implies the existence of a constant such that, for all and , , we have . Taken together, the two previous sentences imply the existence of a constant such that, for all and , , we have that the length of is at most , and so certainly is less than . Let be a natural number such that, for all and all , implies that ; by the previous sentence and the fact that is of degree , such an will exist. Let be a polynomial-time heuristic for . Notice that —i.e., —is a polynomial-time heuristic for , since and . Let be such that there is a set of strings , , having the property that for all , fails to correctly determine the membership of in . Consequently, there is a set of strings , , such that for all , fails to correctly determine the membership of in ; in particular the set

 TnB={(rB(x))∧(new1∧new2∧⋯∧newq(rB(x)))|x∈SnB}

has this property.

Using the variable renaming , it is now easy to see that we have proven that every length at which (viewed as a heuristic for ) errs on at least inputs of length up to has a corresponding length at which (viewed as a heuristic for ) errs on at least inputs of length up to . Our hypothesis guarantees the existence of infinitely many such (resp., almost all can take the role of ), each with a corresponding . Setting

 ϵ=12k+1,

our theorem is now proven. ∎

## 5 Conclusions

We constructed easily recognizable families of Boolean formulas that provide hard instances for decision problems related to backdoors and backbones under the assumption that . In particular, we have shown that under that assumption , there exist easily recognizable families of Boolean formulas with easy-to-find strong backdoors, yet for which it is hard to determine whether the formulas are satisfiable. Under the same assumption, we have shown that there exist easily recognizable collections of Boolean formulas for which it is hard (in fact, NP-complete) to determine whether they have a backbone, and that there exist easily recognizable collections of Boolean formulas for which it is hard (in fact, NP-complete) to determine whether they have a large backbone. (These results can be taken as indicating that, under the very plausible assumption that , search and decision shear apart in complexity for backdoors and backbones. That makes it particularly unfortunate that their definitions in the literature are framed in terms of decision rather than search, especially since when one tries to put these to work in SAT solvers, it is the search case that one typically tries to use and leverage.)

For both our backdoor and backbone results, we have shown that if any problem in is frequently hard, then there exist families of Boolean formulas of the sort we describe that are hard almost as frequently as .

## References

• [DGS14] B. Dilkina, C. Gomes, and A. Sabharwal. Tradeoffs in the complexity of backdoors to satisfiability: dynamic sub-solvers and learning during search.

Annals of Mathematics and Artificial Intelligence

, 70(4):399–431, 2014.
• [DLL62] M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communications of the ACM, pages 394–397, July 1962.
• [DP60] M. Davis and H. Putnam. A computing procedure for quantification theory. Journal of the ACM, 7(3):201–215, 1960.
• [Gas12] W. Gasarch. The second P=?NP poll. SIGACT News, 43(2):53–77, 2012.
• [HN16] L. Hemaspaandra and D. Narváez. The opacity of backbones. Technical Report arXiv:1606.03634 [cs.AI], Computing Research Repository, arXiv.org/corr/, June 2016. Revised, January 2017.
• [HN17] L. Hemaspaandra and D. Narváez. The opacity of backbones. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 3900–3906. AAAI Press, February 2017.
• [HZ96] L. Hemaspaandra and M. Zimand. Strong self-reducibility precludes strong immunity. Mathematical Systems Theory, 29(5):535–548, 1996.
• [KSTW05] P. Kilby, J. Slaney, S. Thiébaux, and T. Walsh. Backbones and backdoors in satisfiability. In Proceedings of the 20th National Conference on Artificial Intelligence, pages 1368–1373. AAAI Press, 2005.
• [Sze05] S. Szeider. Backdoor sets for DLL subsolvers.

Journal of Automated Reasoning

, 35(1–3):73–88, 2005.
• [WGS03] R. Willams, C. Gomes, and B. Selman. Backdoors to typical case complexity. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 1173–1178. Morgan Kaufmann, August 2003.