Learning Abduction under Partial Observability

Juba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examples; relatedly, it offers no role for declarative background knowledge, as such knowledge is rendered redundant in the abduction task by complete information. In this work, we extend the formulation to utilize such partially specified examples, along with declarative background knowledge about the missing data. We show that it is possible to use implicitly learned rules together with the explicitly given declarative knowledge to support hypotheses in the course of abduction. We observe that when a small explanation exists, it is possible to obtain a much-improved guarantee in the challenging exception-tolerant setting. Such small, human-understandable explanations are of particular interest for potential applications of the task.

Authors

• 17 publications
• 12 publications
• 1 publication
09/01/2012

06/29/2021

Semantic Reasoning from Model-Agnostic Explanations

With the wide adoption of black-box models, instance-based post hoc expl...
01/21/2020

Explaining sophisticated machine-learning based systems is an important ...
06/12/2021

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

Many commonsense reasoning NLP tasks involve choosing between one or mor...
02/27/2013

Belief Maintenance in Bayesian Networks

Bayesian Belief Networks (BBNs) are a powerful formalism for reasoning u...
03/27/2013

Towards Solving the Multiple Extension Problem: Combining Defaults and Probabilities

The multiple extension problem arises frequently in diagnostic and defau...
02/05/2020

`Why not give this work to them?' Explaining AI-Moderated Task-Allocation Outcomes using Negotiation Trees

The problem of multi-agent task allocation arises in a variety of scenar...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Abduction is the task of inferring a plausible hypothesis to explain an observed or hypothetical condition. Although it is most prominently observed in scientific inquiry as the step of proposing a hypothesis to be investigated, it is also an everyday mode of inference. Simple tasks such as understanding stories from Hobbs et al. (1990) and images from Cox and Pietrzykowski (1986) and Poole (1990) involve a process of abduction to infer an interpretation of the larger events, context, and motivations that are only partially depicted. Its significance to AI was first recognized by Charniak and McDermott (1985).

In this work, we consider a PAC-learning (Valiant, 1984, 2000) formulation of the combined task of learning to abduce, introduced by Juba (2016). In this formulation, one is given a collection of examples drawn from the prior distribution (i.e., example jointly sampled values of attributes) together with a condition to explain, represented as a Boolean formula on the attributes. The task is then to propose a formula , which essentially must be a -DNF for computational reasons, satisfying the following two criteria:

1. Plausibility:

the probability that

is satisfied on the prior distribution must be at least some (given) minimum value

2. Entailment: the probability that the condition to explain is satisfied, conditioned on the hypothesis holding, is at least for some given error tolerance .

By casting the task as operating directly on examples, Juba avoids the problem of explicitly learning and representing the prior distribution. The main shortcoming of this formulation is that it assumes access to complete information, so any attributes to be invoked in the explanation must be recorded in all of the examples. This is a problem, for example, when we wish to infer the intentions of characters in stories, which are frequently either left ambiguous or are assumed to be clear from the given context. It is also a problem if, for example, we would like to use the abduced hypothesis to guide further exploration that may include attributes that we previously were not measuring.

Our work extends Juba’s formulation of the abduction task to use partial examples and draw on declaratively specified background knowledge. We observe that by using a covering algorithm, it is possible to guarantee significantly better explanations when a small hypothesis (using relatively few terms) is adequate. Concretely, when some -term -DNF explanation on attributes has an error rate of , we obtain an error rate of , in contrast to the bound obtained for the state-of-the-art algorithm of Zhang, Mathew, and Juba (2017), which gave an error rate of (but does not consider the effect of the size of the hypothesis).

2 Preliminaries

We work in a standard machine learning model in which the data consists of many

examples, assigning Boolean values to a variety of attributes. For example, if our data is about birds, each bird may correspond to an example and then there can be attributes such as: whether the bird has feathers or not, whether it eats bugs or not, and other properties.

2.1 Partial Observability

In the real world, it is hard to require each example to contain all of the attributes. So, we want to make inferences with incomplete data. Partial observability means that some attributes of examples may be unknown. We represent this by allowing the value of each attribute to be 1 (true), 0 (false), or (unobserved). For instance, an example could be []. (For convenience, we denote to be the th example and to be the th coordinate of an example .) In our abduction task, we say our partial examples are drawn from such a masked distribution .

2.2 Implicit Learning

The main tool to deal with partial observability is implicit learning. Implicit learning means learning without producing explicit representations. Given a knowledge base (a set of formulas), and a query formula, we want to know if the knowledge base can derive the query formula. The main theorem of implicit learning says, as long as the formulas in a knowledge base are sufficiently observed in partial examples, we can determine whether the knowledge base can derive the query without explicitly constructing or representing the knowledge base.

Definition 1 (Witnessed Formula)

Given a partial example , we say a formula is witnessed if is 0 or 1, where means the formula restricted to example .

Formally, a restricted formula is defined recursively: we break down a formula at its logical connectives recursively up to single variables, where for the base cases, these singletons are set by the values from the example . Restricted formulas’ explicit expressions can be computed in linear time.

Informally speaking, we get the restricted formula by plugging in the observed value of the given example, resulting in a (shorter) formula of the unobserved variables. For instance, let , and in a partial example, . Then is witnessed (true) even though is not observed. But it could be hard to determine the value for more complex formulas; in general, this may be as hard as deciding if the formula is a tautology, which is NP-hard. Notice that each formula can be either witnessed true, witnessed false, or not witnessed.

Proof system

Given a knowledge base (a set of formulas) and a query formula , for our purposes a proof system is an algorithm that can determine whether we can derive . If we can derive , we say is provable and denote it as .

A proof system is restriction-closed if whenever there is a proof of a formula , there is also a proof of for any partial assignment . In general, if there is a proof of from , we say that is provable from KB under . The formal language may be confusing, but the definition is indeed intuitive. Consider the following example: , , , . If in , are observed true and are unobserved, then , , , We thus anticipate, .

Notice that most common propositional proof systems such as Resolution, (Forward) Chaining, Cutting Planes, and Polynomial Calculus are indeed restriction closed.

2.3 DecidePAC Algorithm

Besides the information we directly witness from the examples, we want to know further what we can infer, given some knowledge base (a set of additional formulas). From the previous work by Juba (2013), we have an algorithm that can tell whether a formula is provable or not. Given knowledge base and partial examples {} drawn from , for a query formula , DecidePAC can tell whether there is a proof of if the knowledge we need is witnessed sufficiently often: DecidePAC will Accept if there exists a proof of in from and formulas that are simultaneously witnessed true with probability at least on ; otherwise, if [KB] is not true with probability at least , then DecidePAC will formula .

Notice that there are three different concepts of being true: 1. (or ), 2. , and 3. . For example, let . In example , it is observed that , so is observed to be true in ; in example , while is unobserved, but if we assume in KB we have , then is provable, so is provable; in example , nothing is observed and we know nothing, but in fact, can be true. Notice that being observed can imply being provable, and being provable can imply truth. We want to bridge from the witnessed values of examples to their ground truth, through logical inference.

DecidePAC was analyzed by Juba using an additive Chernoff bound. We can obtain an analogous multiplicative guarantee by instead using the multiplicative Chernoff bound:

Lemma 2 (Multiplicative Chernoff Bound)

Let

be independent random variables taking values in

, such that . Then for ,

 Pr[1m∑iXi>(1+γ)p] ≤e−mpγ2/3andPr[1m∑iXi<(1−γ)p] ≤e−mpγ2/2

3 Abduction under Partial Observability

Given a query or an event, abduction is the task of finding an explanation for the query or event. An explanation is a combination of some conditions that may have caused the query. For example, when the query is “Engine does not run," an explanation can be “No gas, or key is not turned.”

We require the resulting explanation to satisfy two conditions, “plausibility” and “entailment.” Entailment means that when the conditions in the explanation are true, the query should also often be true, or at least rarely false. Thus, the explanation is a (potential) cause of the query. Plausibility means the explanation is often true. In other words, for many examples, these conditions are observed. This suppresses unlikely explanations such as “A comet hits the car.” which is a valid entailment, but not plausible.

Definition 3 (Partial Information Abduction)

For any fixed proof system, abduction is the following task: given any query formula and independent partial examples over a masked distribution , we want to find a -DNF explanation , such that the explanation satisfies:

1. (Plausibility)

2. (Weak Entailment)

Recall, a -DNF explanation with terms is in the following form: where each term . For convenience, we say and .

3.1 Choice of Formulation

We have chosen to relax the condition that in Juba’s complete information abduction task to the condition that some term of is provable under . This is of intermediate strength between being observed and being provable. Provability captures whether or not an agent “knows” is true of a given partial example . Our choice is somewhat like the notion of vivid knowledge by Levesque (1986), that the individual literals of some definite should be known. The weaker condition that merely is provable is also interesting, but seems much harder to work with; we leave it as a direction for future work. We could also have relaxed this to cases where is not provable, but observe that this includes the cases where is unknown in its favor. Note that this may “mix” many cases where

was actually false into our estimate of the effect of

occurring, which is not desirable, and we anticipate that it would harm the quality of the inferences we can draw.

We made the opposite decision for , relaxing it to the condition that is not provable. The main reason for this choice is that we wish to not penalize a good if it is often impossible to check whether or not holds. We use this liberal notion of entailment for our explanations because the intended semantics of the task is merely to propose possible causes given some tentative partial knowledge of the world, perhaps to guide further investigation. At the same time, we would like to take to be very small, so that we can aggressively rule out ’s for which is frequently known to fail to occur. But, if we are including the outcome of being unknown as a “failure” of , then this suggests that in the cases where is indeed often unknown, then must be large, even for a good .

4 Implicit Abduction Algorithm

A -DNF explanation is actually a disjunction of terms, . Each term represents a condition, or a possibility. Our goal is to find a formula that covers as many such conditions as possible while still being a potential cause of the query .

We observe there is a natural correspondence between our -DNF abduction task and set cover: each example of abduction is an element of the set cover problem, and each term is a set. We say a term covers an example when the term is provable in that example. The number of examples from the distribution is equivalent to its frequency or empirical probability with respect to the distribution M(D). If the resulting explanation consists of terms that are provable in most of examples, then we can conclude that our explanation is provable with high probability.

In the implicit abduction algorithm, we enumerate through all possible literal terms:

1. Check all the terms using the same technique underlying DecidePAC: We count the number of bad examples where and are both provable. If the bad examples are more than a -fraction, then we delete this term.

By the Chernoff bound, all the terms that pass the test then satisfy weak entailment: the error condition has probability at most .

2. Then use the greedy algorithm to choose an explanation. If the algorithm can find an explanation covering a -fraction of examples, then we can argue explanation has probability larger than by the Chernoff bound.

Thus, if there exists a good explanation, we can find an explanation satisfying entailment and plausibility.

Remark    If is the optimal probability that the terms of a potential explanation can be provable, Juba (2016) showed that a multiplicative approximation to can be easily found by binary search. We assume that such an estimate is given as input.

Theorem 4 (Implicit Abduction)

Given a query , partial examples from a masked distribution , and an efficient restriction-closed proof system with knowledge base , for constant :

If there exists a -term -DNF satisfying:

1. With probability at least over from , , such that is provable from under (Plausibility).

2. Under drawn from , if some term of is provable, then is only provable with probability at most . (Weak Entailment)

Then, we can find a -DNF in polynomial time, such that with probability ,

1. (Plausibility)

2. (Weak Entailment).

4.1 Proof of the Main Theorem

Soundness.

We first show that if the implicit abduction algorithm returns an explanation , then satisfies weak entailment. Plausibility will follow from the assumption that a good explanation exists, so we postpone its discussion to our discussion of completeness, below.

Each term of the explanation is checked by Implicit Learning, so all terms have low error rates: for ,

Claim 5

For our choice of we can guarantee that with probability , for all terms that pass the first test,

Proof of Claim 5    In the Implicit Learning Algorithm, we enumerate through all possible -DNF terms over attributes, so there are at most possible terms. In the algorithm, for every term that passes the first test, happens in less than a -fraction of the examples. By the multiplicative Chernoff bound, when we take enough examples, we will be able to guarantee that , i.e., any term with at most bad examples has error at most with high probability. For each term, the Chernoff bound requires examples to be correct with probability . We have chosen so that after a union bound over the terms we get . Thus, examples suffice.

Completeness.

We just proved that every output satisfies weak entailment with probability . Now, we want to show if there is an optimal r-term k-DNF explanation satisfying

1. (Plausibility) for a -fraction of examples, some term is provable, and

2. (Weak Entailment) if some is provable, then with high probability is not provable

then we are able to find a good solution that satisfies plausibility and weak entailment.

Claim 6

If there exists a solution such that [ is provable when some is provable] has probability at most , then all these terms can pass the first test with probability .

Proof of Claim 6    We are given that . By a Chernoff bound, for our choice of , happens for any in in less than -fraction of examples with probability, so all these terms pass the first test.

Next, we show the number of terms is controlled, since depends upon the solution of the set cover problem.

Claim 7

If there exists a solution that satisfies

then Implicit Abduction finds an using at most terms such that . Furthermore, by a union bound on the error of each term, . We thus find that with probability at least satisfies plausibility with and weak entailment.

Proof of Claim 7    Following Claims 5 and 6, with probability at least , all terms in can pass the first test, so they are available for set cover. Moreover, by another Chernoff bound, since at least one of the terms of is provable with probability in each example, with probability at least one of the terms is provable in at least a -fraction of the examples. Thus, there is a set of terms (the terms of ) that pass these tests and indeed cover a examples. For the greedy algorithm, if Opt () covers examples using sets, then our greedy algorithm can find a cover using sets that also covers examples Slavík (1997).

Recall that . For each term, by Claim 5, , so if take an union bound over the terms of , the error, , is at most in total. If we plug in , the resulting error is .

To see that the returned satisfies plausibility, we consider a Chernoff bound for the fraction of examples in which each possible -term -DNF has a provable term with . So when we take a union bound on all -DNF explanations, any -term explanation found will actually have plausibility with probability . Therefore, it suffices to have

 m≥3μγ2log2|T|r′δorm≥3rlog(μm)μγ2log2|T|δ.

Here we apply the inequality

Lemma 8

For , if , then .

By plugging in and , we get examples suffice. Here, is dominated by other terms, so we get .

Since we condition on some provable and ,

 Pr [ ⊢(¬c)|ρ | ∃t∈h: ⊢t|ρ ] =Pr[(∃t∈h⊢t|ρ)∧(⊢(¬c)|ρ)]/Pr[∃t∈h: ⊢t|ρ]

and thus, we indeed find an satisfying weak entailment with the claimed error rate with probability .

Finally, when we plug in ,

 O(rlog(μm)(1+γ)ϵ)=~O(rlog(μrμγ2lognkδ)(1+γ)ϵ) =~O(r(loglogn+logk+loglog1δ+log1γ)(1+γ)ϵ)

We conclude that with probability .

Running time.

The test is run for each term of size at most , of which there may be . And DecidePAC runs in time polynomial in , , (the running time of the underlying algorithm for the proof system), and ; the overall running time is also polynomial, as needed.

Acknowledgements

B. Juba and E. Miller were partially supported by an AFOSR Young Investigator Award.

References

• Charniak and McDermott (1985) Charniak, E., and McDermott, D. 1985.

Introduction to Artificial Intelligence

.