Recent work has shown that some common machine learning classifiers can be compiled into Boolean circuits that make the same decisions. This includes Bayesian network classifiers with discrete features[ChanD03, ShihCD19]
and some types of neural networks[ShihDC19b, ChoiShiShihDarwiche19]. Proposals were also extended to explain and verify these numeric classifiers by operating on their compiled circuits [ShihCD18, ShihCD18b]. We extend this previous work by proposing a theory for reasoning about the decisions made by classifiers and discus its theoretical and practical implications.
In the proposed theory, a classifier is a Boolean function. Its variables are called features, a particular input is called an instance and the function output on some instance is called a decision. If the function outputs on an instance, the instance and decision are said to be positive; otherwise, they are negative. Figure 2 depicts a classifier () for college admission, represented as an Ordered Binary Decision Diagram (OBDD) [Bryant86]. This OBDD was compiled from the Bayesian network (BN) classifier in Figure 2 using the algorithm in [ShihCD19]. The OBDD is guaranteed to make the same decision as the BN classifier on every instance (same input-output behavior).
Our main goal is to explain the decisions made by a classifier on specific instances by way of providing various insights into what caused these decisions. Consider Susan who passed the entrance exam, is a first-time applicant, has no work experience and a high GPA. Susan will be admitted by classifier depicted in Figure 2. She also comes from a rich hometown and will be admitted by classifier depicted in the same figure. We can say that Susan was admitted by classifier because she passed the entrance exam and has a high GPA. We can also say that one reason why classifier admitted Susan is that she passed the entrance exam and has a high GPA (there are other reasons in this case). Moreover, we can say that classifier would still admit Susan even if she did not have a high GPA because she passed the entrance exam and comes from a rich hometown. Finally, we can say that classifier is biased as it can make biased decisions: ones that are based on protected features. For example, it will make different decisions on two applicants who have the same characteristics except that one comes from a rich hometown and the other does not. We will also show that one can sometimes prove classifier bias by inspecting the reasons behind one of its unbiased decisions.
We will give formal definitions and justifications for the statements exemplified above and show how to compute them algorithmically. As far as semantics, the main tool we will employ is the classical notion of prime implicants [BooleanFunctions, quine1, mccluskey, quine2]. On the computational side, we will exploit tractable Boolean circuits [DarwicheJAIR02] while providing some new fundamental results that further extend the reach of these circuits to computing explanations.
This paper is structured as follows. We review prime implicants in Section 2. We follow by introducing the notions of sufficient, necessary and complete reasons in Sections 3–5. Counterfactual statements about decisions are discussed in Section 6, followed by a discussion of decision and classifier bias in Section 7. We dedicate Section 8 to algorithms that compute the introduced notions while illustrating them using a case study in Section 9. We finally close with some concluding remarks in Section 10.
2 Classifiers, Decisions and Prime Implicants
We represent a classifier by a propositional formula whose models (i.e., satisfying assignments) correspond to positive instances. The negation of the formula characterizes negative instances. Classifiers and of Figure 2 are represented by the following formulas:
We use to denote the decision ( or ) of classifier on instance (that is, iff and iff ). We also define if the decision is positive and if the decision is negative. This notation is critical and we use it frequently later noting that and iff .
A literal is a variable (positive literal) or its negation (negative literal). A term is a consistent conjunction of literals. Term subsumes term , written , iff includes the literals of . For example, term subsumes term . We treat a term as the set of its literals so we may write to also mean that subsumes . We sometimes refer to a literal as a characteristic and to a term as a property (of an instance). We use to denote the property resulting from negating every characteristic in property . We sometimes use a comma () instead of a conjunction () when describing properties and instances (e.g., instead of ).
An implicant of propositional formula is a term that satisfies , written . A prime implicant is an implicant that is not subsumed by any other implicant. For example, is an implicant of but is not prime since it is subsumed by another implicant , which happens to be prime. Classifier has the following prime implicants:
Classifier has the following prime implicants:
The set of prime implicants for a propositional formula can be quite large, which motivated the notion of a prime implicant cover [quine1, mccluskey, quine2]. A set of terms is prime implicant cover for propositional formula if each term is a prime implicant of and is equivalent to . A cover may not include all prime implicants, with the missing ones called redundant. While covers can be useful computationally, they may not always be appropriate for explaining classifiers as they may lead to incomplete explanations (more on this later).
We will make use of the conditioning operation on propositional formula. To condition formula on literals , denoted , is to replace every literal in with if and with if . We will also use existential quantification: .
In the next few sections, we introduce the notions of sufficient, complete and necessary reasons behind a decision. We use these notions later to define decision and classifier bias in addition to giving semantics to counterfactual statements relating to decisions.
3 Sufficient Reasons
Prime implicants have been studied and utilized extensively in the AI and computer science literature.222One classical application of prime implicants in AI has been in the area of model-based diagnosis, where they have been used to formalize the notion of kernel diagnoses [diagnosisPI]. A kernel diagnosis is defined for a given device behavior and is a minimal term representing the health of some device components. Any system state that is compatible with a kernel diagnosis is feasible under the given system behavior. Moreover, the set of kernel diagnoses characterize all feasible system states under the given behavior. However, their active utilization in explaining decisions is more recent, e.g., [ShihCD18, IgnatievNM19, JoaoNIPS19, ReasonsMoral], and introduced a key connection to properties of instances that we highlight next and exploit computationally later.
Definition 1 (Sufficient Reason [ShihCD18]).
A sufficient reason for decision is a property of instance that is also a prime implicant of (recall is if the decision is positive and otherwise).
A sufficient reason identifies characteristics of an instance that justify the decision: The decision will stick even if other characteristics of the instance were different. A sufficient reason is minimal: None of its strict subsets can justify the decision. A decision can have multiple sufficient reasons, sometimes a very large number of them.333The LIME [LIME] and Anchor [ANCHOR] systems can be viewed as computing approximations of sufficient reasons. The quality of these approximations has been evaluated on some datasets and corresponding classifiers in [JoaoApp], where an approximation is called optimistic if it is a strict subset of a sufficient reason and pessimistic if it is a strict superset of a sufficient reason.
There is a key difference between prime implicants and sufficient reasons: the latter must be properties of the given instance. This has significant computational implications that we exploit in Section 8.
Sufficient reasons were introduced in [ShihCD18] under the name of PI-explanations. The new name we adopt is motivated by further distinctions that we draw later and was also used in [ReasonsMoral]. We will also sometimes say “a reason” to mean “a sufficient reason.”
Greg passed the entrance exam, is not a first time applicant, does not have a high GPA but has work experience (). Classifier admits Greg, a decision that can be explained using either of the following sufficient reasons:
Passed the entrance exam and is not a first time applicant ().
Passed the entrance exam and has work experience ().
Since Greg passed the entrance exam and has applied before, he will be admitted even if his other characteristics were different. Similarly, since Greg passed the entrance exam and has work experience, he will be admitted even if his other characteristics were different.
Every decision has at least one sufficient reason.
Consider decision . We have , which means is consistent and must have at least one prime implicant (the empty term if is valid). Moreover, at least one of these prime implicants must be a property of instance since and since is equivalent to the disjunction of its prime implicants. Hence, we have at least one sufficient reason for the decision. ∎
A classifier may make the same decision on two instances but for different reasons (i.e., disjoint sufficient reasons). However, if two decisions on distinct instances share a reason, they must be equal.
If decisions and share a sufficient reason, the decisions must be equal .
Suppose the decisions share sufficient reason . Then is property of both and and is a prime implicant of both and . Hence, since is consistent and . ∎
We will see later that sufficient reasons can provide insights about a classifier that go well beyond explaining its decisions.
4 Complete Reasons
A sufficient reason identifies a minimal property of an instance that can trigger a decision. The complete reason behind a decision characterizes all properties of an instance that can trigger the decision.
Definition 2 (Complete Reason).
The complete reason for a decision is the disjunction of all its sufficient reasons.
The complete reason for decision captures every property of instance , and only properties of instance , that can trigger the decision. It precisely captures why the particular decision is made.
Let be the complete reason for decision . If instance does not satisfy and , then no sufficient reason for decision can be a property of instance .
Suppose and . Then . Let be a sufficient reason for decision . Then is a property of instance and a prime implicant of both and . If were a property of instance , then is a sufficient reason for decision , and , a contradiction. Hence, cannot be a property of instance . ∎
We will sometimes say “the reason” to mean “the complete reason.” Classifier admits Greg () for the reason . Greg was admitted because he passed the entrance exam and satisfied one of two additional requirements: he applied before and has work experience. Classifier also admits Susan (). Susan does not satisfy the reason . There is one sufficient reason for admitting Susan: she passed the entrance exam and has a good GPA (), which is not a property of Greg. The classifier admitted Greg and Susan for different reasons.
The complete reason for a decision is unique up to logical equivalence and can be used to enumerate its sufficient reasons.444Pierre Marquis observed that the complete reason can be formulated using the notion of literal forgetting which is a more fine grained notion than variable forgetting (also known as existential quantification) [Marquis00, DBLP:journals/jair/LangLM03, DBLP:journals/tocl/HerzigLM13].
Let be the complete reason for decision . The prime implicants of are the sufficient reasons for decision .
Let be the sufficient reasons for decision and hence . The key observation is that terms are properties of instance . Hence, for every two terms and , term cannot contain some literal while term containing literal . The DNF is then closed under consensus.555The consensus rule infers the term from terms and . One can convert a DNF into its set of prime implicants by closing the DNF under consensus and then removing subsumed terms; see [BooleanFunctions, Chapter 3]. Since no term subsumes another term , the DNF contains all prime implicants of . Hence, the prime implicants of complete reason are precisely the sufficient reasons of decision . ∎
5 Necessary Properties and Reasons
The necessary property of a decision is a maximal property of an instance that is essential for explaining the decision on that instance.
Definition 3 (Necessary Characteristics and Properties).
A characteristic is necessary for a decision iff it appears in every sufficient reason for the decision. The necessary property for a decision is the set of all its necessary characteristics.
The necessary property is unique but could be empty (when the decision has no necessary characteristics).
If an instance ceases to satisfy one necessary characteristic, the corresponding decision is guaranteed to change.
If instance disagrees with instance on only one characteristic necessary for decision , then .
Suppose and are as premised. If then and is an implicant of by consensus on the flipped characteristic . Moreover, does not contain characteristic so it cannot be necessary, a contradiction. ∎
If an instance ceases to satisfy more than one necessary characteristic, the decision does not necessarily change. However, if the decision sticks then it would be for completely different reasons.
Let be an instance that disagrees with instance on at least one characteristic necessary for decision . Decisions and must have disjoint sufficient reasons.
Let be the necessary characteristics of decision that instances and disagree on. A sufficient reason of cannot be a property of instance since and contains . Hence, cannot be a sufficient reason for decision and the two decisions must have disjoint sufficient reasons. ∎
Consider a classifier and instance . The decision is positive with as the only sufficient reason. Hence, all three characteristics of are necessary: Flipping any single characteristic of instance will lead to a negative decision. However, flipping the two characteristics and preserves the positive decision but leads to a new, single sufficient reason .
The complete reason for a decision has enough information to compute its necessary characteristics and necessary property.
A characteristic is necessary for a decision iff it is implied by the decision’s complete reason.
We can now define the notion of necessary reason.
Definition 4 (Necessary Reason).
The necessary property of a decision is called the necessary reason for the decision iff it is the only sufficient reason for the decision.
There may be no necessary reason for a decision as there may be no instance property that is both sufficient and necessary for triggering the decision. We next highlight how the complete reason for a decision, being a condition on an instance instead of a property, is always necessary and sufficient for explaining the decision.
Consider the complete reason for decision and recall that it characterizes all properties of instance that can trigger the decision: where is a property of instance . The reason is then a logical condition that triggers the decision (). If the complete reason is weakened into a condition that continues to trigger the decision (), then will admit properties not satisfied by instance . Moreover, if it is strengthened into a condition , then will continue to trigger the decision () but will stop admitting some properties of instance that can trigger the decision. Hence, the complete reason is a necessary and sufficient condition (not necessarily a property) for explaining the decision on instance .
6 Decision Counterfactuals
We mentioned Susan earlier who passed the entrance exam, is a first time applicant, has a high GPA but no work experience (). Classifier admits Susan because she passed the entrance exam and has a high GPA. Greg was also admitted by this classifier. His application is similar to Susan’s except that he applied before and has work experience (). We cannot pinpoint a single property of Greg that triggered admission, so we cannot issue a “because” statement when explaining this decision.
Definition 5 (Because).
Consider decision and let be a property of instance . The decision is made “because ” iff is the complete reason for the decision.
A decision is made because iff is the necessary reason for the decision (i.e., the only sufficient reason).
One may be interested in statements that provide insights into a decision beyond the reasons behind it. For example, how the classifier may have decided if some instance characteristics were different.
An example statement is the one we mentioned in Section 1: Susan would have been admitted even if she did not have a high GPA because she comes from a rich hometown and passed the entrance exam. This statement exemplifies counterfactuals of the following form: The decision will stick even if because , where and are properties of the given instance.
Definition 6 (Even-If-Because).
Consider decision and let and be properties of instance . The decision sticks “even if because ” iff is the complete reason for the decision after changing property of instance to (i.e., flipping all characteristics in ).
Let be the result of replacing property of instance by and suppose that is the complete reason for decision . Then is the only sufficient reason for decision by Definition 2. Hence and properties and must be disjoint. Moreover, so and . Hence, the decision sticks “even if because .”
Applicant Susan discussed earlier () is admitted by classifier . The decision will stick even if Susan had a low GPA () because she comes from a rich hometown and passed the entrance exam (). This statement is justified since is the complete reason for decision where is the result of replacing characteristic by in instance .
Jackie did not pass the entrance exam, is not a first time applicant, has a low GPA but has work experience (). Jackie is denied admission by classifier . The decision will stick even if Jackie had a high GPA () because she did not pass the entrance exam (). This statement is justified since is the complete reason for decision where is the result of replacing characteristic by in instance .
7 Decision Bias and Classifier Bias
We will now discuss the dependence of decisions on certain features, with a particular application to detecting decision and classifier bias.
Intuitively, a decision is biased if it depends on a protected feature: one that should not be used when making the decision (e.g., gender, zip code, or ethnicity).666A protected feature may have been unprotected during classifier design. We formalize bias next while making a distinction between classifier bias and decision bias: A classifier may be biased in that it could make biased decisions, but the particular decisions it already made may have been unbiased. While classifier bias can always be detected by examining its decision function, we will show that it can sometimes be detected by examining the complete reason behind one of its unbiased decisions.
Definition 7 (Decision Bias).
Decision is biased iff for some that disagrees with on only protected features.
Bias can be positive or negative. For example, an applicant may be admitted because they come from a rich hometown, or may be denied admission because they did not come from a rich hometown.
The following result provides a necessary and sufficient condition for detecting decision bias.
A decision is biased iff each of its sufficient reasons contains at least one protected feature.
Suppose decision is biased yet has a sufficient reason with no protected features. We will now show a contradiction. Since the decision is biased, there must exist an instance that disagrees with instance on only protected features and . Since is a property of and , we have and . Hence, and , which is a contradiction.
Suppose every sufficient reason of decision contains at least one protected feature. Let be these protected features and be the characteristics of instance that do not involve features . Assume for every instance that agrees with instance on characteristics (that is, disagrees with only on features in ). Term must then be an implicant of and a subset of must be a prime implicant of (could be itself). Since is a property of instance , decision has sufficient reason that does not include a protected feature in , which is a contradiction. Hence, for some instance that disagrees with instance on only protected features in , and decision is biased. ∎
Theorem 4 does not require sufficient reasons to share a protected feature, only that each must contain at least one protected feature.
Consider classifier , which admits applicants who have a good GPA () as long as they pass the entrance exam (), are male () or come from a rich hometown ():
Bob has a good GPA, did not pass the entrance exam and comes from a rich hometown (). He is admitted with two sufficient reasons: and . The decision is biased since each sufficient reason contains a protected feature. This classifier will not admit Nancy who has similar characteristics but does not come from a rich hometown: . It will also admit Scott who has the same characteristics as Nancy: .
Even though this classifier is biased, some of its decisions may be unbiased. If an applicant has a good GPA and passes the entrance exam (), they will be admitted regardless of their protected characteristics. Moreover, if an applicant does not have a good GPA (), they will be denied admission regardless of their other characteristics, including protected ones.
Definition 8 (Classifier Bias).
A classifier is biased iff at least one of its decisions is biased.
A classifier may be biased, but some of its decisions may be unbiased. Moreover, one can sometimes infer classifier bias by inspecting the sufficient reasons behind one of its unbiased decisions.
A classifier is biased iff one of its decisions has a sufficient reason that includes a protected feature.
Suppose some decision has a sufficient reason that contains protected features . For any instance such that , we must have . We now show that there is an instance and instance that disagrees with on only features such that . Suppose the contrary is true: for all such and , we have . Then is an implicant of , where are the protected characteristics in . This is impossible since is a prime implicant of . Hence, for some and with the stated properties and the classifier is biased. ∎
If decision has protected features in some but not all of its sufficient reasons, the decision is not biased according to Theorem 4. But classifier is biased according to Theorem 5 as we can prove that it will make a biased decision on some other instance .
Consider classifier in (1) and Lisa who has a good GPA, passed the entrance exam and comes from a rich hometown (). The classifier will admit Lisa for two sufficient reasons: and . The decision is unbiased: any applicant who has similar unprotected characteristics will be admitted. However, since one of the sufficient reasons contains a protected feature, the classifier is biased as it can make a biased decision on a different applicant. The proof of Theorem 5 suggests that the classifier will make different decisions on two applicants with a good GPA that disagree only on whether they come from a rich hometown. Nancy () and Heather () are such applicants.
A decision is biased iff is not valid where are all unprotected features and is the complete reason behind the decision.
Let be the decision’s sufficient reasons and hence . Existentially quantifying variables from a DNF is done by replacing their literals with . The result is valid iff some term contains only variables in . Hence, is not valid iff each term contains variables beyond (i.e., each sufficient reason contains protected features). ∎
The following result shows how classifier bias can sometimes be detected based on the complete reason behind an unbiased decision.
A classifier is biased if where is a protected feature and is the complete reason for some decision.
Given Theorems 2 and 5, it is sufficient to show that iff feature appears in some prime implicant of . Let be the prime implicants of . Feature appears either positively or negatively in these prime implicants since terms are all properties of the same instance. Suppose without loss of generality that feature appears positively in terms (if any). Then and . Hence iff for some prime implicant . ∎
8 Computing Reasons and Related Queries
The enumeration of PI-explanations (sufficient reasons) was treated in [ShihCD18] by modifying the algorithm in [CoudertMadre93] for computing prime implicant covers; see also [PrimeOverview93, minato1993fast]. The modified algorithm optimizes the original one by integrating the instance into the prime implicant enumeration process, but we are unaware of a complexity bound for the original algorithm or its modification. Moreover, since the algorithm is based on prime implicant covers, it is incomplete. Consider classifier , which has three prime implicants: , and . The last prime implicant is redundant and may not be generated when computing a cover. Instance leads to a positive decision and two sufficient reasons: and . An algorithm based on covers may miss the sufficient reason and is therefore incomplete. This can be problematic for queries that rely on examining all sufficient reasons, such as decision and classifier bias (Definitions 7 and 8).
We next propose a new approach based on computing the complete reason for a decision (Definition 2), which characterizes all sufficient reasons, and then use it to compute multiple queries. For example, we can enumerate all sufficient reasons using the reason (Theorem 2). We can also use it to compute the necessary reason for a decision (Proposition 4) and to detect decision bias (Theorem 6). Even classifier bias can sometimes be inferred directly using the reason (Theorem 7) among other queries.
Assuming the classifier is represented using a suitable tractable circuit (e.g., OBDD), our approach will compute the complete reason for a decision in linear time regardless of how many sufficient reasons it may have (could be exponential). Moreover, it will ensure that the computed complete reason is represented by a tractable circuit, allowing us to answer many queries in polytime.
8.1 Computing Complete Reasons
Our approach is based on Decision-DNNF circuits, obtained using compilers such as c2d777http://reasoning.cs.ucla.edu/c2d/ [Darwiche04], mini_c2d888http://reasoning.cs.ucla.edu/minic2d/ [OztokD14, OztokD18] and d4999http://www.cril.univ-artois.fr/kc/d4.html [LagniezM17].
Definition 9 (Decision-NNF Circuit).
A DNNF circuit has literals or constants as inputs and two type of gates: and-gates and or-gates, where the subcircuits feeding into each and-gate share no variables. It is called a Decision-DNNF circuit if every or-gate has exactly two inputs of the form: and , where is a variable.
DNNF circuits were introduced in [DarwicheJACM01]. Decision-DNNF circuits were identified in [HuangD05, HuangD07] and include Ordered Binary Decision Diagrams (OBDDs) [Bryant86, HuangD07]. Figure 3 depicts an OBDD and its corresponding Decision-DNNF circuit. The circuit is obtained by mapping each OBDD node with variable , high child and low child into the circuit fragment (two and-gates and one or-gate). For more on DNNF circuits and OBDD, see [DarwicheJAIR02, OztokD14].
We compute the reason behind decision by applying two operations to a Decision-DNNF circuit : consenus then filtering.
Definition 10 (Consensus Circuit).
The consensus circuit of Decision-DNNF circuit is denoted and obtained by adding input to every or-gate with inputs and .
Figure 3 depicts a Decision-DNNF circuit and its consensus circuit (third from left). The consensus operation adds four and-gates denoted with double circles.
A Decision-DNNF circuit has the same satisfying assignments as its consensus circuit .
A consensus circuit can be obtained in time linear in the size of Decision-DNNF circuit, but is not a DNNF circuit. We next discuss the filtering of a consensus circuit, which leads to a tractable circuit.
Definition 11 (Filtered Circuit).
The filtering of consensus circuit by instance , where , is denoted and obtained by replacing every literal by constant .
Filtering is only defined on consensus circuits and requires an instance that satisfies the circuit. Figure 3 depicts an example. The filtered circuit is on the far right of the figure, where grayed out nodes and edges can be dropped due to replacing literals by constant .
Filtering is also a linear time operation. Consensus preserves models, but filtering drops some of them. We will characterize the models preserved by filtering after presenting two required results.
Let be a circuit that results from filtering by instance . The circuit is monotone in the following sense. If instance agrees with instance no less than instance does, then implies . For example, if , and .
If circuit results from filtering by instance then every literal in appears in , and if .
Filtering removes every literal not in instance . Hence, every literal in the filtered circuit is in , which implies the next result.
Suppose that and . When evaluating circuit at compared to , the only literals that change values are and . Literals change values from to and literals change values from to . Changes to the values of cannot decrease the output of circuit since it is an NNF circuit. Literals are not in since so do not appear in circuit and changes to their values do not matter. Hence, . ∎
We also need the following result which identifies circuit models that are preserved by filtering due to having applied consensus.
Consider a Decision-DNNF circuit and instance such that . If is an implicant of and then is also an implicant of .
Let , and . We need to show that . That is, preserves the implicants of that are satisfied by . The proof is by induction on the structure of .
(Base Case) If is a literal or a constant, then since consensus is not applicable and filtering will not replace literal by constant ( since ). Hence, .
(Inductive Step) If then where and . Since and do not share variables (decomposability), (Cartesian product). Similarly, . By the induction hypothesis, and . Hence,
(Inductive Step) If and literal then where and . Due to decomposability, and do not appear in or . Hence, where
Since we have
Moreover, . By the induction hypothesis, and , which gives and . Hence, . ∎
The following fundamental result reveals the role of filtering a consensus circuit. It also reveals our linear-time procedure for computing the complete reason behind a decision as a (tractable) circuit that compactly characterizes all sufficient reasons.
Consider a Decision-DNNF circuit and instance such that . Term is a prime implicant of and (that is, is a sufficient reason for decision ) iff is a prime implicant of .
Let and observe that since and is the result of replacing some inputs of NNF circuit with constant .
Suppose is a prime implicant of circuit and . Then is an implicant of circuit by Proposition 7, . If is not a prime implicant of , we must have some term such that . Therefore since , which means that is not a prime implicant of , a contradiction. Hence, is a prime implicant of .
Suppose is a prime implicant of circuit . Then is an implicant of since . We next show that is a prime implicant of and . Let be an instance such that and disagrees with on all variables outside . Then and . Every instance such that must satisfy since , leading to by Theorem 8. Hence, is an implicant of . Since is a prime implicant of , we must have and hence . Suppose now is not a prime implicant of . Some term is then a prime implicant of and . By the first part of this theorem, is a prime implicant of , a contradiction. Therefore, is a prime implicant of . ∎
Definition 12 (Reason Circuit).
For classifier , instance and a Decision-DNNF circuit for , circuit is called a “reason circuit” and denoted .
The circuit depends on the specific Decision-DNNF circuit used to represent but will always have the same models.
8.2 Tractability of Reason Circuits
We next show that reason circuits are tractable. Since we represent the complete reason for a decision as a reason circuit, many queries relating to the decision can then be answered efficiently.
Definition 13 (Monotone).
An NNF circuit is monotone if every variable appears only positively or only negatively in the circuit.
Reason circuits are filtered circuits and hence monotone as shown by Theorem 8. The following theorem mirrors what is known on monotone propositional formula, but we include it for completeness.
The satisfiability of a monotone NNF circuit can be decided in linear time. A monotone NNF circuit can be negated and conditioned in linear time to yield a monotone NNF circuit.
The satisfiability of a monotone NNF circuit can be decided using the following procedure. Constant is not satisfiable. Constant and literals are satisfiable. An or-gate is satisfiable iff any of its inputs is satisfiable. An and-gate is satisfiable iff all its inputs are satisfiable. All previous statements are always correct except the last one which depends on monotonicity. Consider a conjunction and suppose every variable shared between the conjuncts appears either positively or negatively in both. Any model of can be combined with any model of to form a model for . Hence, the conjunction is satisfiable iff each of the conjuncts is satisfiable. Conditioning replaces literals by constants so it preserves monotonicity. To negate a monotone circuit, replace and-gates by or-gates, or-gates by and-gates and literals by their negations. Monotonicity is preserved. ∎
Given Theorem 10, the validity of a monotone NNF circuit can be decided in linear time (we check whether the negated circuit is unsatisfiable).101010Validity can be checked more directly as follows. Constant is valid. Constant and literals are not valid. An and-gate is valid iff all its inputs are valid. An or-gate is valid iff any of its inputs is valid. The previous statements are always correct except the last one which requires monotonicity. We can also conjoin the circuit with a literal in linear time to yield a monotone circuit since .
Variables can be existentially quantified from a monotone circuit in linear time, with the resulting circuit remaining monotone. This is critical for efficiently detecting decision bias as shown by Theorem 6.
Replacing every literal of variable with constant in monotone NNF circuit yields a circuit equivalent to .
If variable appears only positively in circuit then and . If variable appears only negatively in then and . Variable can therefore be existentially quantified by replacing its literals with constant . ∎
8.3 Computing Queries
We can now discuss algorithms. To compute the sufficient reasons for a decision : get a Decision-DNNF circuit for , transform it into a consensus circuit, filter it by instance and finally compute the prime implicants of filtered circuit. Algorithm 1 does this in place, that is without explicitly constructing the consensus or filtered circuit. It assumes a positive decision (otherwise we pass ).
Algorithm 1 uses subroutine which conjoins two DNFs by computing the Cartesian product of their terms. It also uses to remove subsumed terms from a DNF.
Consider a Decision-DNNF and instance . If then a call to Algorithm 1 returns the prime implicants of circuit .
Consider a decision and its complete reason as a monotone NNF circuit obtained by consensus then filtering. Let be the size of circuit and be the number of features. We next show how to compute various queries using circuit .
Sufficient Reasons. By Theorems 2 and 12, the call to Algorithm 1 will return all sufficient reasons for decision , assuming is a Decision-DNNF circuit. The number of sufficient reasons can be exponential, but we can actually answer many questions about them without enumerating them directly as shown below.
Necessary Property. By Proposition 4, characteristic (literal) is necessary for decision iff . This is equivalent to being unsatisfiable, which can be decided in time given Theorem 10. The necessary property (all necessary characteristics) can then be computed in time.
Necessary Reason. To compute the necessary reason (if any) we compute the necessary property and check whether it satisfies the complete reason. This can be done in time.
Because Statements. To decide whether decision was made “because ” we check whether property is the complete reason for the decision (Definition 5): and . We have iff is unsatisfiable. Moreover, iff is unsatisfiable for every literal in . All of this can be done in time.
Even if, Because Statements. To decide whether decision would stick “even if because ” we replace property with in instance to yield instance (Definition 6). We then compute the complete reason for decision and check whether it is equivalent to . All of this can be done time.
9 Another Admissions Classifier
We now consider a more refined admission classifier to illustrate the notions and concepts we introduced more comprehensively.
This classifier highly values passing the entrance exam and being a first time applicant. However, it also gives significant leeway to students from a rich hometown. In fact, being from a rich hometown unlocks the only path to acceptance for those who failed the entrance exam. The classifier is depicted as an OBDD in Figure 6. It corresponds to the following Boolean formula, which is not monotone (the previous classifiers we considered were all monotone):
The classifier has the following prime implicants, some are not essential (all prime implicants of a monotone formula are essential):
We will consider applicants Scott, Robin and April in Figure 6, where feature is protected (whether the applicant comes from a rich hometown). The complete reasons for the decisions on these applicants are shown in Figure 6. These are reason circuits produced as suggested by Definition 12, except that we simplified the circuits by propagating and removing constant values (a reason circuit is satisfiable as it must be satisfied by the instance underlying the decision).
The decision on applicant Scott is biased. To check this, we can existentially quantify unprotected features from the reason circuit in Figure 6 and then check its validity (Theorem 6). Existential quantification is done by replacing the literals in the circuit with constant . The resulting circuit is not valid. We can also confirm decision bias by considering the sufficient reasons for this decision, which all contain the protected feature (Theorem 4):
If we flip the protected characteristic to , the decision will flip with the complete reason being so Scott would be denied admission because he is not a first time applicant and does not come from a rich hometown (Definition 5).
The decision on Robin is not biased. If we existentially quantify unprotected features from the reason circuit (by replacing their literals with constant ), the circuit becomes valid. We can confirm this by examining the decision’s sufficient reasons:
Two of these sufficient reasons do not contain the protected feature so the decision cannot be biased (Theorem 4). The decision will be the same on any applicant with the same characteristics as Robin except for the protected feature . However, since some of the sufficient reasons contain a protected feature, the classifier must be biased (Theorem 5): It will make a biased decision on some other applicant. This illustrates how classifier bias can be inferred from the complete reason behind one of its unbiased decisions. This method is not complete though: the classifier may still be biased even if no protected feature appears in a sufficient reason for one of its decisions.
The decision on April is not biased even though the protected feature appears in the reason circuit (the circuit is valid if we existentially quantify all features but ). Moreover, are all the necessary characteristics for this decision (i.e., the necessary property). Flipping either of these characteristics will flip the decision. Recall that violating the necessary property may either flip the decision or change the reason behind it (Theorem 3) but flipping only one necessary characteristic is guaranteed to flip the decision (Proposition 3).
The decision on April would stick even if she were not to have work experience () because she passed the entrance exam (), has a good GPA () and is a first time applicant (). April would be denied admission if she were to also violate one of these characteristics (Definition 6 and Proposition 3).
We close this section by an important remark. Even though most of the notions we defined are based on prime implicants, our proposed theory does not necessarily require the computation of prime implicants which can be prohibitive. Reason circuits characterize all relevant prime implicants and can be obtained in linear time from Decision-DNNF circuits. Reason circuits are also monotone, allowing one to answer many queries about the embedded prime implicants in polytime. This is a major contribution of this work.
We introduced a theory for reasoning about the decisions of Boolean classifiers, which is based on the notions of sufficient, necessary and complete reasons. We presented applications of the theory to explaining decisions, evaluating counterfactual statements about decisions and identifying decision bias and classifier bias. We also presented polytime and linear-time algorithms for computing most of the introduced notions based on the new and tractable class of reason circuits.
We wish to thank Arthur Choi and Jason Shen for providing valuable feedback. This work has been partially supported by NSF grant #ISS-1910317, ONR grant #N00014-18-1-2561, DARPA XAI grant #N66001-17-2-4032 and a gift from JP Morgan. The views in this paper do not necessarily represent those of sponsors.