 # Compilability of Abduction

Abduction is one of the most important forms of reasoning; it has been successfully applied to several practical problems such as diagnosis. In this paper we investigate whether the computational complexity of abduction can be reduced by an appropriate use of preprocessing. This is motivated by the fact that part of the data of the problem (namely, the set of all possible assumptions and the theory relating assumptions and manifestations) are often known before the rest of the problem. In this paper, we show some complexity results about abduction when compilation is allowed.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Deduction, induction, and abduction [Pei55] are the three basic reasoning mechanisms. Deduction allows drawing conclusions from known facts using some piece of knowledge, so that “battery is down” allows concluding “car will not start” thanks to the knowledge of the rule “if the battery is down, the car will not start”. Induction derives rules from the facts: from the fact that the battery is down and that the car is not starting up, we may conclude the rule relating these two facts. Abduction is the inverse of deduction (to some extent [MF96]): from the fact that the car is not starting up, we conclude that the battery is down. Clearly, this is not the only possible explanation of a car not starting up. Therefore, we may get more than one explanation. This is an important difference between abduction and deduction, making the former, in general, more complex.

While deduction formalizes the process of drawing conclusions, abduction formalizes the diagnostic process, which attempts to invert the cause-effect relation by inferring the causes from its observable effects. The example of the car shows such an application: complete knowledge about car would allow finding (i.e., abducing) the possible reasons of why the car is not starting up. The following example shows how abduction can be applied to formalize a diagnostic scenario.

###### Example 1

While writing a paper with some authors located in another country, you get a set of macros that are used in a nice figure they drew. However, when compiling the .tex file, an incomprehensible error message results. Four explanations are possible:

: the macro has been used with the wrong arguments;

: the package is required;

: the macro is incompatible with package ;

: the wrong version of TeX has been used.

This scenario can be formalized in logical terms by introducing a variable to denote the presence of compile errors: since each of the facts above causes , we know , , etc. Moreover, we know that a package cannot at the same time be required and incompatible with the macros. The following theory formalize our knowledge.

 T={a→f,p→f,t→f,v→f,¬(p∧t)}

This theory relates the observed effect (the compile error) with its possible causes (we used the wrong version of TeX, etc.) Therefore, it can be used to find the possible causes: namely, an explanation is a set of facts that logically imply the observed effect. Formally, an explanation is a set of variable that allow deriving the observed effects from the theory . However, to make sense an explanation has to be consistent with our knowledge, that is, with the theory .

This example shows that a given problem of abduction may have one, none, or even many possible solutions (explanations). Moreover, a consistent and an implication checks are required just to verify an explanation. These facts intuitively explain why abduction is to be expected to be harder than deduction. This observation has indeed been confirmed by theoretical results. Selman and Levesque [SL90] and Bylander et al. [BATJ89] proved the first results about fragments of abductive reasoning, Eiter and Gottlob [EG95] presented an extensive analysis, and Eiter and Makino have shown the complexity of computing all abductive explanations [EM02]. All these results proved that abduction is, in general, harder than deduction. The analysis has also shown that several problems are of interest in abduction. Not only the problem of finding an explanation is relevant, but also the problems of checking an explanation, or whether a fact is in all, or some, of the explanations are.

A common fact about deduction and abduction is that the knowledge relating facts may be known in advance, while the particular observation may change from time to time. In the example of the car, the fact that the dead battery makes the car not to start is always known, while the fact that the battery is dead may or may not be true. The possible causes of TeX errors are known before a specific error message comes out, etc.

We can therefore assign two different statuses to the knowledge base and to the single facts: while the knowledge base is fixed, the single facts are varying. In the example above, will always reflect the state of the word, while is only true when the TeX complains about something.

This difference has computational consequences. While the example we have shown here does not present any problem of efficiency, larger and more complex abduction problems result from the formalization of real-world domains. The difference of status of and the observations can then be exploited. Indeed, since is always the same, we can perform a preprocessing step on it alone, even before the status of the observations are known. Clearly, we cannot explain an observation we do not know. However, this preprocessing step can be used to perform some computation that would otherwise be done on alone. As a result, finding a solution might take less time when the observation finally get known.

The idea of using a preprocessing step for speeding-up the solving of abduction problems is not new. For instance, Console, Portinale, and Duprè [CPT96] have shown how compiled knowledge can be used in the process of abductive diagnosis.

Preprocessing part of the input data has also been used in many other areas of computer science, as there are many problems with a similar fixed-varying part pattern. However, the first formalization of intractability with preprocessing is relatively recent [CDLS02]. In this paper, we characterize the complexity of the problems about abductions from this point of view.

## 2 Preliminaries

The problem of abduction is formalized by a knowledge base, a set of observations, and a set of possible facts that can explain the observations. In this paper, we are only concerned about propositional logic. Therefore, the knowledge is formalized by a propositional theory . We usually denote by the set of observations.

The theory is must necessarily contain all variables of , otherwise there would be no way of explaining the observations. In general, the theory contains other variables as well, describing facts we do not know whether they are true or not. Some of these facts can be taken as part of a possible explanation, while others are can not. Intuitively, when we are trying to establish the causes of an observation, we want the first cause, and not something that is only a consequence of it. In the example of the car, the fact that there is no voltage in the starting engine explains the fact that the car is not starting up, but it is not an acceptable explanation, as it does not tell where the real problem is (the battery). Therefore, the abduction problem is not defined only in terms of the theory and the observation, but also of the set of possible facts (variable) we would accept as first causes of the observation.

Formally, an instance of abduction is a triple . The observations are formalized as , which is a set of variables. is a propositional theory formalizing our knowledge of the domain. Finally, is a set of variables; these variables are the ones formalizing facts that we regards as possible first causes.

Abduction is the process of explaining the observation. Its outcome will therefore be a set of facts from which all observations can be inferred. Since we can only use variables of to form explanations, these will be subsets . Moreover, an explanation can only be accepted if it is consistent with our knowledge. This leads to the following definition of the possible solutions (explanations) of a given abduction problem .

 SOL(H,M,T)={H′⊆H | H′∪T is consistent % and H′∪T⊨M}

We apply this definition to the running example of the TeX file.

###### Example 2

The propositional theory of the example shown in the introduction is . The observation is the variable formalizing the presence of compiler errors, that is, . Of the variables of , all but can be taken as possible first causes of the problem, that is, .

Abduction amounts to finding a set of literals that explain the observation . Formally, this is captured by the constraint . Note that satisfies this formula; this is not an acceptable explanation: “the reason of why the file does not compile is that it does not compile” is a tautology, not an explanation. This problem is avoided by enforcing .

All non-empty subsets of implies, together with , the observation . However, the subsets containing both and are inconsistent with . Therefore, the set of solution of the problem is given by:

 SOL(H,M,T)={H′⊆H | H≠∅, {t,p}⊈H′}

This is simply the formal result of our current definition. However, some explanations in this set are not really reasonable: for example, the explanation is seems overly pessimistic: the macro has been called in the wrong way and a package is required and we used the wrong compiler version.

The set contains all explanations we consider possible. However, some explanations may be more likely than others. For example, explanations requiring a large number of assumptions are often less likely than explanations with less assumptions.

Likeliness of explanations is formalized by an an ordering over the subsets of . Given a specific , the set of minimal solutions is defined as follows.

 SOL⪯(H,M,T)=min(SOL(H,M,T),⪯)

The ordering is used to formalize the relative plausibility explanations: means that is considered more likely to be the “real” cause of the manifestations than . The ordering represents the concept of “at least as likely as”, thus holds if and are equally likely. The definition of formalizes the principle of choosing only the explanations we consider more likely.

An implicit assumption of this definitions is that the ordering does not depend on the set of manifestations. We also assume that is a “well-founded” ordering, that is, any non-empty set of explanations has at least one -minimal element. Therefore, if the set is not empty, then is not empty as well.

In this paper we take into account several plausibility ordering. The absence of a preference among the explanations can be formalized as the ordering that is equal to the universal relation, that is, for any pair of sets of variables and .

Besides this no-information ordering, the two simplest and most natural orderings are -preference, where an explanation is more likely of if , and -preference, where is preferred to if it contains less hypothesis, that is, .

Both these orderings are based on the principle of making as few hypotheses as possible, and by assuming that all hypotheses are equally likely. Two other orderings follows from assuming that the hypotheses are not equally likely: the -prioritization and the -prioritization.

In particular, we assume that the hypotheses are partitioned into equivalence classes of equal likeliness. Let be such a partition. By definition, it holds and for each . The instances of the problem of abduction can thus be written as . The set of all assumptions is implicitly defined as the union of the classes . We assume that the hypotheses in are the most likely, while those in are the least likely.

The -prioritization and -prioritization compare explanations on the basis of their relative plausibility. Namely, the explanations that use hypothesis in lower classes are more likely than explanations using hypothesis in higher classes. This idea, when combined with subset containment, defines the -prioritization. When it is combined with the cardinality-based ordering, it defines the -prioritization. Formal definition is below.

Penalization is the last form of preference we consider. The idea is to assign weights to assumptions to formalize their likeliness. Explanations with the least total weight are preferred. Weights encodes the likeliness of assumptions: the most high the weight of an assumption, the unlikely it is to be true. To use penalization, the instance of the problem must include, besides , , and , an -tuple of weights , where each is an integer number (the weight) associated to a variable . The instance can thus be written .

The considered orderings are formally defined as follows:

-preference

if and only if ;

-preference

if and only if ;

-prioritization

if and only if or there exists such that , , , ;

-prioritization

if and only if either for each , or there exists such that , , , ;

penalization

if and only if .

Let us consider the use of these orderings on the running example.

###### Example 3

The use of -preference or -preference reduces the set of possible explanations of the example of the TeX file. Namely, -preference let minimal-size explanations only to be solutions of the problem. The only such explanations are , , , and . The explanation , being not minimal, is not a solution of the problem any more. The use of preference therefore avoids having as solutions some sets that contains too many hypotheses. Since -preference only selects explanations that are not contained in other ones, the only solutions it produces are , , , and . In this case, the two kinds of the preference generate the same solutions, but this is not always the case.

Prioritization allows for a further refinement of the set of solutions by exploiting the plausibility ordering over the hypotheses. For example, we may assume that the fact that package is required and that we used the wrong version of the compiler are the two most likely hypotheses. Formally, they will be part of the first set of assumptions , while the other assumptions will therefore go in . Formally, the problem instance is now . Both -prioritization and -prioritization produce and as the only minimal explanations. This is because all other explanations either have a bigger intersection with , or an equal intersection with but a bigger intersection with .

Finally, penalization requires a weight (an integer number) for each hypothesis. Let us for example use the set of weights associated with the set of hypotheses . Since larger weights correspond to less likely hypotheses, we are assuming that our first and third hypotheses ( and ) are the least likely, while is more likely and is the most likely. From definition, the explanation is the one having the least weight, and is therefore the only solution of the problem.

The basic problem of abduction is that of finding one or more explanations. However, we have already remarked that none may exist. Therefore, the first problem we consider is the existence one: given an instance of abduction, does an explanation exist? Another related problem is that of verifying, once a set of hypotheses has been found, whether it is really an explanation or not.

Other problems are related to the structure of the explanations. Namely, hypotheses that are in all explanations may considered as “sure” conclusions of the abductive process. On the other hand, hypotheses that are part of some explanations can be regarded as “possible” conclusions.

The formal definition of these questions as decision problems is as follows.

Existence:

is there an explanation of the observed manifestations? That is, ?

Verification:

given a set , is a minimal solution? That is, ?

Relevance:

given a variable , is there a minimal solution containing ? That is, such that and ?

Necessity:

is in all, and at least one, minimal solution? That is, and we have that implies ?

Dispensability:

is such that either there is no solution or there exists one who does not contain ? That is, or such that and ?

Dispensability is the converse of the problem of necessity, since an hypothesis is dispensable if and only if it is not necessary. The problem of dispensability is not of much interest by itself, but is sometimes useful for simplifying the proofs.

Clearly, the ordering does not matter for the problem of existence, since we consider only well-founded orderings: therefore, an explanation exists if and only if a minimal explanation exists. For the other problems, the ordering must be taken into account. Different orderings may lead to different computational properties.

In this paper, we assume that is a 3CNF formula: this assumption does not cause a loss of generality unless we want to assume that .

## 3 Complexity and Compilability

The basic complexity classes of the polynomial hierarchy [Sto76, GJ79], such as P, NP, coNP, etc., are assumed known to the reader. We denote by C, , etc. arbitrary classes of the polynomial hierarchy. The length of a string is denoted by .

We summarize some definitions and results proposed to formalize the on-line complexity of problems [CDLS02]. In computational complexity, problems whose solution can only be yes or no are the most commonly analyzed. Such problems are called decision problems. Any such problem can be formalized as set of strings, those whose solution is yes. For example, the problem of propositional satisfiability (deciding whether a formula is satisfiable or not) is characterized by the set of the strings that represent exactly all satisfiable formulae.

The strings that compose the set associated to a problem represent the possible problem instances that produce a positive solution. Problems like abduction, however, have instances that can be naturally broken into two parts: one part is known in advance ( and ) and one part is only known at run-time (). Therefore, the instances of such problems are better encoded as pairs of strings. Therefore, a problem like abduction is formalized by a set of pairs of strings, rather than a set of strings. We define a language of pairs as a subset of .

The difference between the first and second element of a pair is that some preprocessing time can be spent on the first string alone. This is done to the aim of solving the problem faster when the second string comes to be known. While our final aim is to reduce the running time of this second phase, some constraints have to be put on the preprocessing phase. Namely, we impose its result to be of polynomial size. Poly-size function are introduced to this purpose: a function from strings to strings is called poly-size if there exists a polynomial such that, for all strings , it holds . An exception to this definition is when represents a natural number: in this case, we impose . Any polynomial function is polysize, but not viceversa. Indeed, a function is poly-time if there exists a polynomial such that, for all , can be computed in time less than or equal to . Clearly, the running time also bounds the size of the output string; on the other hand, even a function requiring exponential running time can produce a very short output. The definitions of polysize and polytime function extend to binary functions as usual.

Using the above definitions, we introduce a new hierarchy of classes of languages of pairs, the non-uniform compilability classes [CDLS02], denoted by C, where C is a generic uniform complexity class, such as P, NP, coNP, or .

###### Definition 1 (∥⇝C classes, [Cdls02])

A language of pairs belongs to C iff there exists a binary poly-size function and a language of pairs such that, for all , it holds:

 ⟨x,y⟩∈S~{}~{} iff ~{}~{}⟨f(x,||y||),y⟩∈S′

Clearly, any problem whose time complexity is in C is also in C: just take and . Some problems in C however belongs to with ; for example, some problem in NP are in These are in fact the problems we are most interested, as the preprocessing phase, running on only, will produce , which allows solving the problem in polynomial time. This is important if these problems cannot be solved in polynomial time without the preprocessing phase (e.g., they are NP-complete).

The class C generalizes the non-uniform class C/poly — i.e., C/poly  C — by allowing for a fixed part . We extend the definition of polynomial reduction to a concept that can be used with these classes.

###### Definition 2 (Non-uniform comp-reduction)

A non-uniform comp-reduction is a triple of functions , where is polytime and and are polysize. Given two problems and , is non-uniformly comp-reducible to (denoted by ) iff there exists a non-uniform comp-reduction such that, for every pair it holds that if and only if .

These reductions allows for a concept of hardness and completeness for the classes C.

###### Definition 3 (∥⇝C-completeness)

Let be a language of pairs and C a complexity class. is C-hard iff for all problems we have that . Moreover, is C-complete if is in C and is C-hard.

The hierarchy formed by the compilability classes is proper if and only if the polynomial hierarchy is proper [CDLS02, KL80, Yap83] — a fact widely conjectured to be true.

Informally, NP-hard problems are “not compilable to P”. Indeed, if such compilation were possible, then it would be possible to define as the function that takes the fixed part of the problem and gives the result of compilation (ignoring the size of the input), and as the language representing the on-line processing. This would implies that a NP-hard problem is in P, and this implies the collapse of the polynomial hierarchy. In general, a problem that is C-complete for a class C can be regarded as the “toughest” problem in C, in the assumption that preprocessing the fixed part is possible.

While C-completeness is adequate to show the compilability level of a given reasoning problem, proving it requires finding a nucomp reduction. We show a technique that let us reuse, with simple modifications, the polytime reductions that were used to prove the usual (uniform) hardness of the problem. Namely, we present sufficient conditions allowing for a polynomial reduction to imply the existence of a nucomp reduction [Lib01].

Let us assume that we know a polynomial reduction from the problem to the problem , and we want to prove the nucomp-hardness of . Some conditions on should hold, as well as a condition over the reduction. If all these conditions are verified, then there exists a nucomp reduction from to .

###### Definition 4 (Classification Function)

A classification function for a problem is a polynomial function from instances of to nonnegative integers, such that .

###### Definition 5 (Representative Function)

A representative function for a problem is a polynomial function from nonnegative integers to instances of , such that , and that is bounded by some polynomial in .

###### Definition 6 (Extension Function)

An extension function for a problem is a polynomial function from instances of and nonnegative integers to instances of such that, for any and , the instance satisfies the following conditions:

1. if and only if ;

2. .

Let us give some intuitions about these functions. Usually, an instance of a problem is composed of a set of objects combined in some way. For problems on boolean formulas, we have a set of variables combined to form a formula. For graph problems, we have a set of nodes, and the graph is indeed a set of edges, which are pairs of nodes. The classification function gives the number of objects in an instance. The representative function thus gives an instance with the given number of objects. This instance should be in some way “symmetric”, in the sense that its elements should be interchangeable (this is because the representative function must be determined only from the number of objects.) Possible results of the representative function can be the set of all clauses of three literals over a given alphabet, the complete graph over a set of nodes, the graph with no edges, etc.

Let for example be the problem of propositional satisfiability. We can take as the number of variables in the formula , while can be the set of all clauses of three literals over an alphabet of variables. Finally, a possible extension function is obtained by adding tautological clauses to an instance.

Note that these functions are related to the problem only, and do not involve the specific problem we want to prove hard, neither the specific reduction used. We now define a condition over the polytime reduction from to . Since is a problem of pairs, we can define a reduction from to as a pair of polynomial functions such that if and only if .

###### Definition 7 (Representative Equivalence)

Given a problem (having the above three functions), a problem of pairs , and a polynomial reduction from to , the condition of representative equivalence holds if, for any instance of , it holds:

 ⟨r(y),h(y)⟩∈B ~{}~{} iff ~{}~{} ⟨r(Repr(Class(y)),h(y)⟩∈B

The condition of representative equivalence can be proved to imply that the problem is C-hard, if is C-hard [Lib01].

## 4 Compilability of Abduction: No Ordering

In this section we analyze the problems of existence of explanation, explanation verification, relevance, and necessity, for the basic case in which no ordering is defined. Formally, we want to determine whether the complexity of the problems related to decrease thanks to the preprocessing step on and .

We first give an high-level explanation of the method we use to prove the incompilability of the considered problems. We begin by applying the method to the problem of existence of explanations, and then we used it for verification, relevance and necessity.

### 4.1 The Method

The problem of deciding whether there exists an explanation for a set of manifestations is -hard [EG95]. Therefore, there exists a polynomial reduction from another -hard problem to this one. In order to prove it is also -hard we can show that the other problem has the three functions, and the reduction satisfies the condition of representative equivalence. Unfortunately, this is not the case. As a result, we have to look for another reduction.

Such a reduction should be as simple as possible. In general, the more similar two problems are, the easier it is to find a reduction. What is the -hard problem that is the most similar to the problem of existence of explanation? Clearly, the problem itself is the most similar to itself.

The theorem of representative equivalence is indeed about a reduction between two problems and , but it does not forbid using the same problem: it only tells that, if we have a reduction from an arbitrary -hard problem to , satisfying representative equivalence, then is -hard. Nothing prevent us from choosing . This technique can be formalized as follows:

• show that there exists a classification, representative, and extension functions for the problem ;

• show that there exists a reduction from to satisfying representative equivalence.

The most obvious reduction from a problem to itself is the identity. In our case, however, identity does not satisfy the condition of representative equivalence. As a result, we have to look for another reduction.

Before showing the technical details of the reductions used, we point out an important feature of this technique. Since the condition of representative equivalence tells that is C-hard if is C-hard, using we prove that is C-hard whenever is C-hard. This result holds even if a precise complexity characterization of is not known. For example, if we only know that is in , but do not have any hardness result, we can still conclude that is NP-hard if it is NP-hard, it is coNP-hard if it is coNP-hard, it is -hard if it is -hard, etc.

In order to simplify the following proofs, we denote with the set of all distinct clauses of length 3 on a given alphabet . Since the theory is in 3CNF by assumption, we have that , where is the set of variables appearing in .

### 4.2 Existence of Solutions

In order to define a reduction from the problem of existence of solutions to itself, we first consider the function from abduction instances to abduction instances defined as follows:

 f(⟨H,M,T⟩) = ⟨H′,M′,T′⟩ where: H′=H∪C∪DM′=M∪{ci | γi∈T}∪{di | γi∉T}T′={¬ci∨¬di | γi∈Π(H∪X)}∪{ci→γi | γi∈Π(H∪X)}

In these formulae, denotes the alphabet of , while and are sets of new variables in one-to-one correspondence with the clauses in . Note that, by definition, is a subset of . The following lemma relates the solutions of with the solutions of .

###### Lemma 1

Let be the function defined above. For any , , , it holds:

 SOL(f(⟨H,M,T⟩))={S∪{ci | γi∈T}∪{di | γi∉T} | S∈SOL(⟨H,M,T⟩)}

Proof.  We divide the proof in three parts. In the first part, we prove that any solution of contains exactly the literals and that are in . In the second part, we prove that, if is a solution of , then is a solution of ; the third part is the proof of the converse.

1. We prove that . Let . Since , we have that . If , then . Since does not contain any positive occurrence of , the theory can imply only if . The same holds for any . This proves that . Since contains either or for any , the same holds for . No other variable in can be in , otherwise would be inconsistent with , which contains the clauses .

2. Let be an element of . We prove that . The point proved above shows that, for each , contains either or , depending on whether . As a result:

 S′∪T′ ≡ S∪{ci | γi∈T}∪{di | γi∉T}∪{¬ci∨¬di}∪{ci→γi} ≡ S∪{ci | γi∈T}∪{di | γi∉T}∪T

As a result, is consistent because the above formula is. Moreover, since the above formula implies , and each variable in appears only once, it also holds . As a result, is a solution of .

3. Let , and let . Since is equivalent to , then is a solution of .

The claim is thus proved.

This lemma shows that any abduction instance can be converted into another one in which the set and the theory only depends on the number of variables of the original instance. This reduction can be used to build a reduction satisfying the condition of representative equivalence.

###### Lemma 2

Let be a positive integer number, and let be the following function:

 gc(⟨H,M,T⟩)=⟨H∪{h|H|+1,…,hc},M,T∪{xr+1∨¬xr+1,…,xc∨¬xc}⟩

where . It holds

 SOL(gc(⟨H,M,T⟩))={S∪H′ | S∈SOL(⟨H,M,T⟩) and H′⊆{h|H|+1,…,hc}}

Proof.  The instance only differs from because of the new assumptions , which are not even mentioned in , and new tautological clauses to . Therefore, any explanation of is also an explanation of . The only difference between these two problems is that assumptions in can be freely added to any explanations.

We now define the classification, representative, and extension functions for the basic problems of abduction. First, the classification function is given by the maximum between the number of variables in and the number of variables in but not in :

 Class(⟨H,M,T⟩)=max(|H|,|Var(T)∖H|)

The representative instance of the class is given by an instance with possible assumptions, other variables, and composed by all possible clauses of three literals over these variables:

 Repr(c)=⟨{h1,…,hc},∅,Π({h1,…,hc}∪{x1,…,xc})⟩

The extension function is also easy to give. For example, we may add to a set of tautologies with new variables.

 Ext(⟨H,M,T⟩,m)=⟨H,M,T∪{xr+1∨¬xr+1,…,xm∨¬xm}⟩ where r=|Var(T)∖H|

These three functions are valid classification, representative, and extension functions for the problem of existence of explanation; they are also valid for the problems of relevance and necessity.

We are now able to show a reduction satisfying the condition of representative equivalence. Let be the reduction defined as follows.

 i(⟨H,M,T⟩)=f(gClass(⟨H,M,T⟩)(⟨H,M,T⟩))

The following theorem is a consequence of the fact that satisfies the condition of representative equivalence.

###### Theorem 1

The problem of establishing the existence of solution of an abductive problem is -hard.

Proof.  By the above two lemmas, has solutions if and only if has solution. Therefore, is a valid reduction from the problem of solution existence to itself. The fixed part of only depends on the class of the instance . As a result, this reduction satisfies the condition of representative equivalence. Since the problem of existence of solutions is -hard [EG95], it is also -hard.

### 4.3 Verification

We consider the problem of verifying whether a set of assumptions is a possible explanation, still in the case of no ordering. An instance of the problem is composed of a triple and a specific subset we want to check being an explanation. Formally, this problem amounts to checking whether is consistent and . The varying part is composed of and . Formally, an instance of the verification problem is a 4-tuple , where .

The first step of the proof is that of finding the three functions (classification, representative, and extension). The functions of the last proof only require minor changes to be used now.

 Class(⟨H,Ha,M,T⟩) = max(|H|,Var(T)∖H) Repr(c) = ⟨{h1,…,hc},∅,∅,Π({h1,…,hc}∪{x1,…,xc}⟩ Exte(⟨H,Ha,M,T⟩) = ⟨H,Ha,M,T∪{xr+1∨¬xr+1,…,xc∨¬xc⟩}⟩ where r=|Var(T)∖H|

We define two functions and to be similar to the functions and of the last section, except for the addition of a candidate explanation .

 f′(⟨H,Ha,M,T⟩) = ⟨H′,Ha∪{ci | γi∈T}∪{di | γi∉T},M′,T′⟩ where ⟨H′,M′,T′⟩=f(⟨H,M,T⟩) g′c(⟨H,Ha,M,T⟩) = ⟨H′,Ha,M′,T′⟩ where ⟨H′,M′,T′⟩=gc(⟨H,M,T⟩)

These functions can be composed to generate a function that satisfies representative equivalence. This way, we prove the nucomp-hardness of the problem of verification.

###### Theorem 2

The problem of verification with no ordering is -complete.

Proof.  By Lemma 1 and Lemma 2, is a solution of if and only if it is a solution of , and that is a solution of if and only if is a solution of .

As a result, both and are reductions from the problem of verification to itself. Moreover, their composition satisfies representative equivalence, since the fixed part of only depends on the class of the instance . We can then conclude that the problem of verification is hard for the compilability class that corresponds to the complexity class it is hard for.

### 4.4 Relevance, Dispensability, and Necessity

We make the following simplifying assumption: given an instance of abduction , where , the problem is to decide whether the first assumption is relevant/dispensable/necessary. Clearly, the complexity of these problems is the same, as we can always rename the variables appropriately.

###### Theorem 3

The problems of relevance and dispensability with no ordering is -hard, while necessity is -hard.

Proof.  By Lemma 1 and Lemma 2, is a reduction from the problem of relevance to the problem of relevance. Indeed, for any , the set is a solution of if and only if is a solution of . As a result, is relevant/dispensable/necessary for if and only if it is so for .

The function satisfies representative equivalence, since the fixed part of only depends on the class of . What is left to prove is the existence of the three functions. We can use the same three ones used for the problem of existence of solutions.

## 5 Compilability of Abduction: Preferences

In this section, we consider the problems of verification, relevance, and necessity when the ordering used is either or . These orderings have in common the fact that the instance of an abduction problem is simply a triple , whereas the orderings of the next section employee classes of priority or weights that are part of the instances. The problem of existence is the same as with no ordering, as these orderings are well founded.

### 5.1 Some General Results

We give some general results about the problem of abduction in the case in which an ordering on explanation is given. In order to keep results as general as possible, we consider an arbitrary ordering satisfying the following natural conditions.

Meaningful.

The ordering is meaningful if, for any variable and any pair of sets and such that it holds:

 H′∪{h}⪯H′′∪{h}   iff   H′⪯H′′

Intuitively, a meaningful ordering compares two explanations and only on the variables they differ.

Irredundant

The ordering is irredundant if, for any pair of sets and it holds:

 H′⊂H′′  ⇒  H′≺H′′

Irredundancy formalizes the natural assumption that hypotheses that are not necessary should be removed.

We determine the compilability of abduction with preference in the same way we did in the case of no ordering: we show that the function is a polynomial reduction from the problems of abduction to themselves, and that it satisfies the condition of representative equivalence. To this aim, we need the analogous of Lemma 1 and Lemma 2.

###### Lemma 3

If is a meaningful ordering, it holds:

 SOL⪯(f(⟨H,M,T⟩))={S∪{ci | γi∈T}∪{di | γi∉T} | S∈SOL⪯(⟨H,M,T⟩)}

Proof.  We use the result of Lemma 1. Namely, since all solutions of coincide on , these variables are irrelevant thanks to the fact that is meaningful.

Formally, we have:

 S∈SOL⪯(f(⟨H,M,T⟩)) ⇔ S∈SOL(f(⟨H,M,T⟩)) and ∄S′∈SOL(f(⟨H,M,T⟩)) . S′⪯S ⇔ S=S1∪{ci | γi∈T}∪{di | γi∉T}, S1∈SOL(H,M,T) and ∄S′1∈SOL(⟨H,M,T⟩) % such that S′1∪{ci | γi∈T}∪{di | γi∉T}≺S1∪{ci | γi∈T}∪{di | γi∉T} ⇔ S=S1∪{ci | γi∈T}∪{di | γi∉T},  S1∈SOL(H,M,T) and ∄S′1∈SOL(⟨H,M,T⟩) . S′1≺S1 ⇔ S=S1∪{ci | γi∈T}∪{di | γi∉T} and S1∈SOL⪯(⟨H,M,T⟩)

This proves the claim.

We can also prove the analogous of Lemma 2.

###### Lemma 4

Let be a positive integer number and let be the following function:

 gc(⟨H,M,T⟩)=⟨H∪{h|H|+1,…,hc},M,T∪{xr+1∨¬xr+1,…,xc∨¬xc}⟩

where . If is an irredundant ordering, it holds:

 SOL⪯(gc(⟨H,M,T⟩))=SOL⪯(⟨H,M,T⟩)

Proof.  Similar to the proof of Lemma 2, but now the hypotheses in are all irrelevant; therefore, they are not part of any minimal explanation.

These lemmas can be used to prove incompilability of abduction when an irredundant and meaningful ordering is used.

### 5.2 Verification

We consider the problem of verifying whether a set of assumptions is a minimal explanation according to the orderings and . More generally, we prove the following theorem for any meaningful and irredundant ordering.

###### Theorem 4

If is a meaningful and irredundant ordering, verifying whether a set of assumptions is a minimal explanation is C-hard for any class C for which the problem is C-hard.

Proof.  The same classification, representative, and extension functions used for the case of no ordering can be used for this case as well.

Let now consider the functions and . From Lemma 3 and Lemma 4 it follows that they are reductions from the problem of verification to itself. Moreover, their composition satisfies representative equivalence.

### 5.3 Relevance, Dispensability, and Necessity

We make the following simplifying assumption: given an instance of abduction , where , the problem is to decide whether the first assumption is relevant/dispensable/necessary. There is no loss of generality in making this assumption. as we can always rename the variables appropriately.

###### Theorem 5

If is a meaningful and irredundant ordering, then the problems of relevance/dispensability/necessity are C-hard for any class C of the polynomial hierarchy for which they are C-hard.

Proof.  From Lemma 3 and Lemma 4, it follows that the reduction is a reduction from the problems of relevance/dispensability/necessity to themselves, if is meaningful and irredundant, and it also satisfies representative equivalence.

Since and are meaningful irredundant orderings, their complexity implies their compilability characterization.

###### Corollary 1

Relevance and dispensability using are -hard, while using they are nucomp-hard. Necessity is -hard and nucomp-hard, using and , respectively.

## 6 Compilability of Abduction: Prioritization and Penalization

We consider the cases in which the ordering over the explanations is defined in terms of a prioritization. The instances of the problem are different from those of the previous section, since is replaced by a partition of assumptions .

In the cases of -prioritization and -prioritization, the induced ordering is meaningful and irredundant. However, the results on meaningful irredundant ordering cannot be directly applied because, in Theorem 4 and Theorem 5, we assumed that the instances have the form , while now they have the form . Therefore, we have to find new classification, representative, and extension functions.

We first consider the problem of verification, and prove its nucomp-hardness. Then, we move to the problems of relevance, dispensability, and necessity. As for the case of -preference and -preference, we employee a sort of normal form, in which the assumption we check is the first one.

### 6.1 Verification

First of all, we show the classification, representative, and extension functions for the problem of verification. The instances of the problem include a “candidate explanation” .

 Class(⟨⟨H1,…,Hm⟩,Ha,M,T⟩) = max(m,|H1|,…,|Hm|,|Var(T)∖∪Hi|) Repr(c) = ⟨⟨{h11,…,h1c},…,{hc1,…,hcc}⟩,∅,∅,Π({h11,…,h1c}∪⋯∪{hc1,…,hcc}∪{x1,…,xc})⟩ Exte(⟨⟨H1,…,Hm⟩,Ha,M,T⟩,m) = ⟨⟨H1,…,Hc⟩,Ha,M,T∪{xr+1∨¬xr+1,…,xm∨¬xm}⟩ where r=|Var(T)∖∪Hi|

These functions can be easily proved to be valid classification, representative, and extension functions. What is missing is a reduction from the problem of verification to itself satisfying the condition of representative equivalence.

To this extent, we use two functions and that are similar to and , respectively. In particular, , where:

 H′1 = H1∪C∪D H′2 = H2 ⋮ H′m = Hm