Log In Sign Up

Local Rule-Based Explanations of Black Box Decision Systems

The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, i.e., explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.


Open the Black Box Data-Driven Explanation of Black Box Decision Systems

Black box systems for automated decision making, often based on machine ...

Explaining a black-box using Deep Variational Information Bottleneck Approach

Briefness and comprehensiveness are necessary in order to give a lot of ...

Interpretable Summaries of Black Box Incident Triaging with Subgroup Discovery

The need of predictive maintenance comes with an increasing number of in...

What made you do this? Understanding black-box decisions with sufficient input subsets

Local explanation frameworks aim to rationalize particular decisions mad...

Outcome-Explorer: A Causality Guided Interactive Visual Interface for Interpretable Algorithmic Decision Making

The widespread adoption of algorithmic decision-making systems has broug...

Human Understandable Explanation Extraction for Black-box Classification Models Based on Matrix Factorization

In recent years, a number of artificial intelligent services have been d...

Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration

Explaining the predictions of AI models is paramount in safety-critical ...

1. Introduction

Popular magazines and newspapers are full of commentaries about algorithms taking critical decisions that heavily impact on our life and society, from granting a loan to finding a job or driving our car. The worry is not only due to the increasing automation of decision making, but mostly to the fact that the algorithms are opaque and their logic unexplained. The main cause for this lack of transparency is that often the algorithm itself has not been directly coded by a human but it has been generated from data through machine learning. Machine learning allows building predictive models which map user features into a class (outcome or decision), obtained by generalizing from a training set of examples. This learning process is made possible by the digital records of past decisions and classification outcomes, typically provided by human experts and decision makers. The process of inferring a classification model from examples cannot be controlled step by step because the size of training data and the complexity of the learned model are too big for humans. This is how we got trapped in a paradoxical situation in which, on one side, the legislator defines new regulations requiring that automated decisions should be explained to affected people111We refer here to the so-called ”right to explanation” established in the European General Data Protection Regulation (GDPR), entering into force in May 2018. while, on the other side, even more sophisticated and obscure algorithms for decision making are generated (wachter2017right; goodman2016eu).

The lack of transparency in algorithms generated through machine learning grants to them the power to perpetuate or reinforce forms of injustice by learning bad habits from the data. In fact, if the training data contains a number of biased decision records, or misleading classification examples due to data collection mistakes or artifacts, it is likely that the resulting algorithm inherits the biases and recommends discriminatory or simply wrong (Barocas2016; Berk2017). The inability of obtaining an explanation for what one considers a biased decision is a profound drawback of learning from big data, limiting social acceptance and trust on its adoption in many sensitive contexts. Starting from (pedreshi2008discrimination) a rich literature has been flourishing on discrimination discovery and avoidance. Some of the ideas developed in that context can be reinterpreted for addressing the more general problem of explaining the logic driving a decision taken by an obscure algorithm, which is precisely the problem tackled in this paper.

In particular, in this paper we address the problem of explaining the decision outcome taken by an obscure algorithm by providing “meaningful explanations of the logic involved” when automated decision making takes place, as prescribed by the GDPR. The decision system can be obscure because based on a deep learning approach, or because of inaccessibility of the source code, or other reasons. We perform our research under some specific assumptions. First, we assume that an explanation is interesting for a user if it clarifies why a

specific decision pertaining that user has been made, i.e., we aim for local explanations, not general, global, descriptions of how the overall system works (guidotti2018survey)

. Second, we assume that the vehicle for offering explanations should be as close as possible to the language of reasoning, that is logic. Thus, we are also assuming that the user can understand elementary logic rules. Finally, we assume that the black box decision system can be queried as many times as necessary, to probe its decision behavior to the scope of reconstructing its logic; this is certainly the case in a legal argumentation in court, or in an industrial setting where a company wants to stress-test a machine learning component of a manufactured product, to minimize risk of failures and consequent industrial liability. On the other hand, we make no assumptions on the specific algorithms used in the obscure classifier: we aim at an

agnostic explanation method, one that works analyzing the input-output behavior of the black box system, disregarding its internals.

We propose a solution to the black box outcome explanation problem suitable for relational, tabular data, called LORE (for LOcal Rule-based Explanations). Given a black box binary predictor and a specific instance labeled with outcome by , we build a simple, interpretable predictor by first generating a balanced set of neighbor instances of the given instance

through an ad-hoc genetic algorithm, and then extracting from such a set a decision tree classifier. A

local explanation is then extracted from the obtained decision tree. The local explanation is a pair composed by (i) a logic rule, corresponding to the path in the tree that explains why has been labeled as by , and (ii) a set of counterfactual rules, explaining which conditions should be changed by so to invert the class assigned by . For example, from the compas dataset (Barocas2016; Berk2017) we may have the following explanation: the rule and the counterfactuals .

The intuition behind our method, common to other approaches, such as LIME (ribeiro2016should), and Anchor (ribeiro2018anchors) is that the decision boundary for the black box can be arbitrarily complex over the whole data space, but in the neighborhood of a data point there is a high chance that the decision boundary is clear and simple, hence amenable to be captured by an interpretable model. The novelty of our method lies in (i) a focused procedure, based on genetic algorithm, to explore the decision boundary in the neighborhood of the data point, which produces a high-quality training data to learn the local decision tree, and (ii) a high expressiveness of the proposed local explanations, which surpasses state-of-the-art methods providing not only succinct evidence why a data point has been assigned a specific class, but also counterfactuals suggesting what should be different in the vicinity of the data point to reverse the predicted outcome. We propose extensive experiments to assess both quantitatively and qualitatively the accuracy of our explanation method.

In the rest of this paper, after describing the state of the art in the field of explanation of black box decision models (Section 2), we offer a formalization of the problem by defining the notions of black box outcome explanation, explanation through interpretable models, and local explanation (Section 3). We then define our method LORE in Section 4. Section 5 is devoted to the experiments, the set up of which requires the definition of appropriate validation measures. We critically compare local versus global explanations, rule-based versus linear explanations, different types of rule-based explanations with respect to the state of the art, and discuss the advantages of genetic algorithms for neighborhood generation. Conclusions and future research directions are discussed in Section 6.

2. Related Work

Recently, the research of methods for explaining black box decision systems has caught much attention (guidotti2018survey). A large number of papers propose approaches for understanding the global

logic of the black box by providing an interpretable classifier able to mimic the obscure decision system. Generally, these methods are designed for explaining specific black box models, i.e., they are not black box agnostic. Decision trees have been adopted to globally explain neural networks

(craven1996extracting; krishnan1999extracting) and tree ensembles (hara2016making; tan2016tree). Classification rules have been widely used to explain neural networks (johansson2004accuracy; augasta2012reverse; andrews1995survey) but also to understand the global behavior of SVMs (fung2005rule; nunez2002rule). Only few methods for global explanation are agnostic with respect to the black box (lou2012intelligible; henelius2014peek). In the cases in which the training set is available, classification rules are also widely used to avoid building black boxes by directly designing a transparent classifier (guidotti2018survey) which is locally or globally interpretable on its own (wang2015falling; lakkaraju2016interpretable; malioutov2017learning).

Other approaches, more related to the one we propose, address the problem of explaining the local behavior of a black box (guidotti2018survey). In other words, they provide an explanation for the decision assigned to a specific instance. In this context there are two kinds of approaches: the model-dependent approaches and the agnostic ones. In the first category most of the papers aim at explaining neural networks and base their explanation on saliency masks, i.e., a subset of the instance that explains what is mainly responsible for the prediction (xu2015show; zhou2016learning). Examples of salient mask are parts of an image, or words or sentences in a text. On the other hand, agnostic approaches provide explanations for any type of black box. In (ribeiro2016should) the authors present LIME, which starts from instances randomly generated in the neighborhood of the instance to be explained. The method infers from them linear models as comprehensible local predictors. The importance of a feature in the linear model represents the explanation finally provided to the user. As a limitation of the approach, a random generation of the neighborhood does not take into account density of black box outcomes in the neighborhood instances. Hence, the linear classifiers inferred from them may not correctly characterize outcome values as a function of the predictive features. We will instead use a genetic algorithm that exploits the black box for instance generation.

Extensions of LIME using decision rules (called anchors) and program expression trees are presented in (ribeiro2018anchors) and (singh2016programs) respectively. (ribeiro2018anchors) uses a bandit algorithm that randomly constructs the anchors with the highest coverage and respecting a precision threshold. (singh2016programs) adopts a simulated annealing approach that randomly grows, shrinks, or replaces nodes in an expression tree. The neighborhood generation process adopted is the same as in LIME. Another crucial weak point of those approaches, is the need for user-specified parameters for desired explanations: the number of features (ribeiro2016should), the level of precision, the maximum expression tree depth (ribeiro2018anchors). Our approach is instead parameter-free.

Concerning the counterfactual part of our notion of explanation, (wachter2017counterfactual) computes a counterfactual for an instance by solving an optimization problem over the space of instances. The solution is an instance close to but with different outcome assigned by the black box333If instead of a black box, we are given a machine learning model, this problem is known as the inverse classification problem (DBLP:journals/jcst/AggarwalCH10).. Our approach provides a more abstract notion of counterfactuals, consisting of logic rules rather than flips of feature values. Thus, the user is given not only a specific example of how to obtain actionable recourse (e.g., how to improve application for getting a benefit), but also an abstract characterization of its neighboorhood instances with reversed black box outcome.

To the best of our knowledge, in the literature there is no work proposing a black box agnostic method for local decision explanation based on both decision and counterfactual rules.

3. Problem and Explanations

Let us start recalling basic notation on classification of tabular data. Afterwards, we define the black box outcome explanation problem, and the notion of explanation for which we propose a solution.

Classification, black boxes, and interpretable predictors.

A predictor or classifier, is a function which maps data instances (tuples) from a feature space with input features to a decision in a target space . We write to denote the decision predicted by , and as a shorthand for . We restrict here to binary decisions. An instance consists of a set of attribute-value pairs , where is a feature (or attribute) and is a value from the domain of

. The domain of a feature can be continuous or categorical. A predictor can be a machine learning model, a domain-expert rule-based system, or any combination of algorithmic and human knowledge processing. We assume that a predictor is available as a software function that can be queried at will. In the following, we denote by

a black box predictor, whose internals are either unknown to the observer or they are known but uninterpretable by humans. Examples include neural networks, SVMs, ensemble classifiers, or a composition of data mining, legacy software, and hard-coded expert systems. Instead, we denote with an interpretable predictor, whose internal processing yielding a decision can be given a symbolic interpretation understandable by a human. Examples of such predictors include rule-based classifiers, decision trees, decision sets, and rational functions.

Black Box Outcome Explanation.

Given a black box predictor and an instance , the black box outcome explanation problem consists in providing an explanation for the decision . We approach the problem by learning an interpretable predictor that reproduces and accurately mimes the local behavior of the black box. An explanation of the decision is then derived from . By local, we mean focusing on the behavior of the black box in the neighborhood of the specific instance , without aiming at providing a single description of the logic of the black box for all possible instances. The neighborhood of is not given, but rather it has to be generated as part of the explanation process. However, we assume that some knowledge is available about the characteristics of the feature space , in particular the ranges of admissible values for the domains of features and, possibly, the (empirical) distribution of features. Nothing is instead assumed about the process of constructing the black box . Let us formalize the problem, and the approach based on interpretable models.

Definition 3.1 (Black Box Outcome Explanation).

Let be a black box, and an instance whose decision has to be explained. The black box outcome explanation problem consists in finding an explanation belonging to a human-interpretable domain .

Definition 3.2 (Explanation Through Interpretable Models).

Let be an interpretable predictor derived from the black box and the instance using some process . An explanation is obtained through , if for some explanation logic which reasons over and .

One point is still missing: which is a comprehensible domain of explanations? We will define an explanation as a pair of objects:

The first component is a decision rule describing the reason for the decision value The second component is a set of counterfactual rules, namely the minimal number of changes in the feature values of that would reverse the decision of the predictor. Let us consider as an example the following explanation for a loan request for user :

Here, the decision is due to the age lower than 25, the absence of job and an amount greater than 5k (see component ). For changing the decision instead it is required either an age higher than 25 and a smaller amount, or owning a clerk job and a car (see component ). Details are provided in the rest of the section.

In a decision rule (simply, a rule) of the form , the decision is the consequence of the rule, while the premise is a boolean condition on feature values. We assume that is the conjunction of split conditions of the form , where is a feature and are values in the domain of extended with444 Using we can model with a single notation typical univariate split conditions, such as equality ( as ), upper bounds ( as ), strict lower bounds ( as for a sufficiently small

). However, since our method is parametric to a decision tree induction algorithm, split conditions can also be multivariate, e..g, 

for features (as in oblique decision trees). . An instance satisfies , or covers , if the boolean condition evaluates to true for , i.e., if is true for every . For example, the rule is satisfied by and not satisfied by . We say that is consistent with , if for every instance that satisfies . Consistency means that the rule specifies some conditions for which the predictor makes a specific decision. When the instance for which we have to explain the decision satisfies , the rule represents a motivation for taking a decision value, i.e., locally explains why returned .

Consider now a set of split conditions. We denote the update of by as . Intuitively, is the logical condition with ranges for attributes overwritten as stated in , e.g. is . A counterfactual rule for is a rule of the form , for . We call a counterfactual. Consistency is meaningful also for counterfactual rules, meaning that the rule is an instance of the decision logic of . A counterfactual describes what features to change and how to change them to get an outcome different from . Since predicts either or , if such changes are applied to the given instance , the predictor will return a different decision. Continuing the example before, changing the age feature of to any value greater than will change the predicted outcome of from to . An expected property of a consistent counterfactual rule is that it should be minimal w.r.t. .  Minimality is measured555Such a measure can be extended to exploit additional knowledge on the feature domains in order not to generate invalid or unrealistic rules. E.g., the split condition appears closer than for an instance with . However, it is not actionable: an individual cannot lower her age, or change her race or gender. w.r.t. the number of split conditions in not satisfied by . Formally, we define (where stands for the number of falsified split conditions), and, when clear from the context, we simply write . For example,  is a counterfactual with two conditions falsified by . It is not minimal if the counterfactual , with only one falsified condition, is consistent for . In summary, a counterfactual rule is a (minimal) motivation for reversing a decision value.

We are now in the position to formally introduce the notion of explanation that we are able to provide.

Definition 3.3 (Local Explanation).

Let be an instance, and be the decision of . A local explanation is a pair of: a decision rule consistent with and satisfied by ; and, a set of counterfactual rules for consistent with .

This definition completes the elements of the black box outcome explanation problem. A solution to the problem will then consists of: (i) computing an interpretable predictor for a given black box and an instance , i.e., defining the function according to Definition 3.2; (ii) deriving a local explanation from and , i.e., defining the explanation logic according to Definition 3.2.

4. Proposed Method

We propose LORE (LOcal Rule-based Explanations, Algorithm LABEL:alg:lore) as a solution to the black box outcome explanation problem. An interpretable predictor is built for a given black box and instance by first generating a set of neighbor instances of through a genetic algorithm, and then extracting from such a set a decision tree . A local explanation, consisting of a single rule and a set of counterfactual rules , is then derived from the structure of .


4.1. Neighborhood Generation

The goal of this phase is to identify a set of instances , with feature characteristics close to the ones of , that is able to reproduce the local decision behavior of the black box . Since the objective is to learn a predictor, the neighborhood should be flexible enough to include instances with both decision values, namely where instances are such that , and instances are such that . In Algorithm LABEL:alg:lore, we extract balanced subsets and (lines 2–3), and then put (line 4). This task differs from approaches to instance selection (DBLP:journals/air/Olvera-LopezCTK10), based on genetic algorithms (DBLP:journals/kbs/TsaiEC13) (also specialized for decision trees (geneticselection2009)), in that their objective is to select a subset of instances from an available training set. In our case, instead we cannot assume that the training set used to train is available, or not even that is a supervised machine learning predictor for which a training set exists. Our task is instead similar to instance generation

in the field of active learning

(DBLP:journals/kais/FuZL13), also including evolutionary approaches (DBLP:journals/ijamc-igi/DerracGH10). We adopt an approach based on a genetic algorithm which generates by maximizing the following fitness functions:

where is a distance function, , and . The first fitness function looks for instances similar to (term ), but not equal to (term ) for which the black box produces the same outcome as (term ). The second one leads to the generation of instances similar to , but not equal to it, for which returns a different decision. Intuitively, for an instance such that and , it turns out . For any instance such that , instead, we have . Finally, . Thus, maximization of occurs necessarily for instances different from and whose prediction is equal to .


Like neural networks, genetic algorithms (holland1992adaptation) are based on the biological metaphor of evolution. They have three distinct aspects. (i) The potential solutions of the problem are encoded into representations that support the variation and selection operations. In our case these representations, generally called chromosomes, correspond to instances in the feature space (ii) A fitness function evaluates which chromosomes are the “best life forms”, that is, most appropriate for the result. These are then favored in survival and reproduction, thus shaping the next generation according to the fitness function. In our case, these instances correspond to those similar to , according to distance , and with the same/different outcome returned by the black box depending on the fitness function or . (iii) Mating (called crossover) and mutation produce a new generation of chromosomes by recombining features of their parents. The final generation of chromosomes, according to a stopping criterion, is the one that best fit the solution.

Algorithm LABEL:alg:gp generates the neighborhoods and of by instantiating the evolutionary approach of (back2000evolutionary). Using the terminology of (DBLP:journals/ijamc-igi/DerracGH10), it is an instance of generational genetic algorithms for evolutionary prototype generation. However, prototypes are a condensed subset of a training set that enable optimization in predictor learning. We aim instead at generating new instances that separate well the decision boundary of the black box . The usage of classifiers within fitness functions of genetic algorithms can be found in (eshelman1991chc; baluja1994population; wu2006optimal; cano2005stratification). However, the classifier they use is always the one for which the population must be selected or generated for and not another one (the black box) like in our case. Algorithm LABEL:alg:gp first initializes the population with copies of the instance to explain. Then it enters the evolution loop that begins by selection of the population having the highest fitness score. After that, the crossover operator is applied on a proportion of according to the probability, the resulting and the untouched individuals are placed in . We use a two-point crossover which selects two parents and two crossover features at random, and then swap the crossover feature values of the parents (see Figure 2). Thereafter, a proportion of , determined by , is mutated and placed in . The unmutated individuals are also added to . Mutation consists of replacing features values at random according to the empirical distribution666In experiments, we derive such a distribution from the test set of instances to explain. of a feature (see Figure 2). Individuals in are evaluated according to the fitness function, and the evolution loop continues until generations are completed. Finally, the best individuals according to the fitness function are returned. Algorithm LABEL:alg:gp is run twice, once using the fitness function to derive neighborhood instances with the same decision as , and once using to derive neighborhood instances with different decision as . Finally, we set .

parent 1 25   clerk 10k   yes parent 2 30   other 5k   no     children 1 25   other 5k   yes children 2 30   clerk 10k   no
Figure 1. Crossover.
parent 25 clerk 10k yes children 27 clerk 7k yes
Figure 2. Mutation
Figure 3. Random forest black box: purple vs green decision. Starred instance . (Top) Uniformly random (left) and genetic generation (right). (Bottom) Density of random (left) and genetic (right) generation. (Best view in color).

Figure 3 shows an example of neighborhood generation for a black box consisting of a random forest model and a bi-dimensional feature space. The figure contrasts uniform random generation around a specific instance (starred) to our genetic approach. The latter yields a neighborhood that is denser in the boundary region of the predictor. The density of generated instances will be a key factor in extracting local (interpretable) predictors.

Distance Function

A key element in the definition of the fitness functions is the distance . We account for the presence of mixed types of features by a weighted sum of simple matching coefficient for categorical features, and of the normalized Euclidean for continuous features. Formally, assuming categorical features and continuous ones, we use:

Our approach is parametric to , and it can readily be applied to improved heterogeneous distance functions (DBLP:journals/prl/McCaneA08). Empirical results with different distance functions are reported in Section 5.3.

4.2. Local Rule-Based Classifier and Explanation Extraction

Given the neighborhood of , the second step is to build an interpretable predictor trained on the instances labeled with the black box decision . Such a predictor is intended to mimic the behavior of locally in the neighborhood. Also, must be interpretable, so that an explanation for (decision rule and counterfactuals) can be extracted from it. The LORE approach considers decision tree classifiers due to the following reasons: (i) decision rules can naturally be derived from a root-leaf path in a decision tree; and, (ii) counterfactuals can be extracted by symbolic reasoning over a decision tree. For a decision tree , we derive an explanation as follows. The decision rule is formed by including in the split conditions on the path from the root to the leaf node that is satisfied by the instance , and setting . By construction, the rule is consistent with and satisfied by . Consider now the counterfactual rules in . Algorithm LABEL:alg:extract_counter looks for all paths in the decision tree leading to a decision . Fix one of such paths, and let be the conjunction of split conditions in it. Again by construction, is a counterfactual rule consistent with . Notice that the counterfactual for which has not to be explicitly computed888However, it can be done as follows. Consider the path from the leaf of to the leaf of . When moving from a child to a father node, we retract the split condition. E.g., is retracted from by adding to . When moving from a father node to a child, we add the split condition to . – this is a benefit of using decision trees. Among all such ’s, only the ones with minimum number of split conditions not satisfied by (line 4 of Algorithm LABEL:alg:extract_counter) are kept in . As an example, consider the decision tree in Figure 4, and the instance for which the decision (e.g., of a loan) has to be explained. The path followed by is the leftmost one in the tree. The decision rule extracted from the path is . There are four paths leading to the opposite decision: , , , and . It turns out: , , , and . Thus, .


Figure 4. Example decision tree.

As a further output, LORE computes a counterfactual instance starting from a counterfactual rule and . Among all possible instances that satisfy , we choose the one that minimally changes attributes from according to . This is done by looking at the split conditions falsified by : , and modifying the lower/upper bound in that is closer to the value in . As an example, for the above path , the counterfactual instance of is , and for is .

5. Experiments

LORE has been developed in Python999The source code and datasets will be available at anonimized url. The experiments were performed on Ubuntu 16.04.1 LTS 64 bit, 32 GB RAM, 3.30GHz Intel Core i7, using, for the genetic neighborhood generation, the deap library (DEAP_JMLR2012), and for decision tree induction (the interpretable predictor), the yadt system (ruggieri2004yadt), which is a C4.5 implementation with multi-way splits of categorical attributes. After presenting the experimental setup, we report next: (i) some analyses on the effect of the genetic algorithm parameters for the neighborhood generation; (ii) evidence that the local genetic neighborhood is more effective than a global approach; (iii) a qualitative and quantitative comparison with naïve baselines and state of the art competitors.

5.1. Experimental Setup

We ran experiments on three real-world tabular datasets: adult, compas and german101010,, In each of them, an instance represents attributes of an individual person. All datasets includes both categorical and continuous features.

The adult dataset from UCI Machine Learning Repository, includes instances with demographic information like age, workclass, marital-status, race, capital-loss, capital-gain etc. The income divides the population into two classes “¡=50K” and “¿50K”.

The compas dataset from ProPublica contains the features used by the COMPAS algorithm for scoring defendants and their risk (Low, Medium and High), for over individuals. We considered two classes “Low-Medium” and “High” risk, and we use the following features: age, sex, race, priors_count, days_b_scree
ning_arrest, length_of_stay, c_charge_degree, is_recid.

In the german dataset from UCI Machine Learning Repository each person of the entries is classified as a “good” or “bad” creditor according to attributes like age, sex, checking_account, credit_amount, duration, purpose, etc.

We experimented the following predictors as black boxes: support vector machines with RBF kernel (

SVM), random forests with 100 trees (RF), and multi-layer neural networks with ‘lbfgs’ solver (NN). Implementations of the predictors are from the scikit-learn library. Unless differently stated, default parameters were used for both the black boxes and the libraries of LORE. Missing values were replaced by the mean for continuous features and by the mode for categorical ones.

Each dataset was randomly split into train (80% instances), and test (20% instances). The former is used to train black box predictors. The latter, denoted by , is the set of instances for which the black box decision have to be explained. In the following, for some fixed set of instances, we denote by the set of decisions provided by the interpretable predictor and by the decisions provided by the black box on the same set.

We consider the following properties in evaluating the mimic performances of the decision tree inferred by LORE and of the explanations returned by it against the black box classifier :

  • . It compares the predictions of and of the black box on the instances used to train (doshi2017towards). It answer the question: how good is at mimicking ?

  • . It compares the predictions of and on the instances covered by the decision rule in a local (hence “l-”) explanation for . It answers the question: how good is the decision rule at mimicking ?

  • . It compares the predictions of and on the instances covered by the counterfactual rules in a local explanation for .

  • . It compares the predictions of and on the instance under analysis. It returns if is equal to , and otherwise.

  • . It compares the predictions of and on a counterfactual instance of built from counterfactual rules in a local explanation of .

We measure the first three of them by the f1-measure (tan2005introduction). Aggregated values of f1 and hit/c-hit are reported by averaging them over the the set of test instances .

Figure 5. Impact of the number of generations and of population size parameters of the genetic neighborhood generation. Bottom plots also report elapsed running times.
Figure 6. Comparison of neighborhood generations methods.

5.2. Analysis of Neighborhood Generation

We analyze here the impact of the number of generations and size of neighborhood on the performances of instance generation and on the size complexity of the LORE output. We report only results for german dataset, since we get similar results for the other ones. The other parameters of Algorithm LABEL:alg:gp (probabilities of crossover and mutation ) are set with the default values of and respectively (back2000evolutionary). Figure 5 shows in the top plots the value of fitness functions and measures of sizes of local classifier (decision tree depth), of decision rule (size of the antecedent ), and of counterfactual rules (number of falsified split conditions). The bottom plots show fidelity (f1-measure) and hit (rate) as well as running times of neighborhood generation. Fixed , after generations, the fitness function converges around the optimal value (top left), fidelity is almost maximized (bottom left), and also the measures of sizes (top right) become stable and small. We then set in all other experiments. Figure 5-(bottom right) shows instead that the size of the neighborhood instances to be generated is relevant for the rate but not for . By taking into account also the running time (right side scale of the bottom plots), a good trade-off is obtained by setting .

Table 1. Comparison of distance measures.

5.3. Comparing Distance Functions

A key element of the neighborhood generation is the distance function used by the genetic algorithm. A legitimate question is whether the results of the approach are affected by the choice of the distance function adopted (see Section 4.1). For instance, (wachter2017counterfactual) presents considerable differences in their output of counterfactual instance varying the choice of the distance in their stochastic optimization approach. Table 1 reports basic measures contrasting the normalized Euclidean distance adopted by LORE with cosine and min-max distance on german dataset. The table does not highlight any considerable difference. This can be justified by the fact that, following instance generation, there are phases, such as decision tree building, that abstract instances to patterns, resulting in resilience against variability due to the distance function adopted.

5.4. Validation of Local Explanations

We now compare our local approach with a global approach, and discuss alternative neighborhood instance generation methods.

Dataset Method tree depth
adult lore .912 .29 .959 .17 .892 .29 4.16 0.21
global .901 .28 .750 .00 .873 .27 12.00 0.00
compas lore .942 .23 .992 .03 .937 .23 4.72 2.15
global .902 .29 .935 .00 .857 .29 12.00 0.00
german lore .925 .26 .988 .07 .920 .26 4.95 2.54
global .880 .32 .571 .00 .824 .31 6.00 0.00
Table 2. Local vs global approach.

Local vs Global Explanations

Extracting a predictor from the neighborhood of an instance is a winning strategy, if contrasted to an approach that builds a single predictor from all instances in the test set, i.e., . In particular, this means that the interpretable predictor will be the same for all instances in the test set. Let compare our approach with such a global approach. Table 2

reports the mean and standard deviation values of

, , and tree depth for each dataset aggregating over the results of the various black boxes. While for both LORE and global obtain similar high performances, for the other scores LORE considerably overtakes global. In particular, the size and depth of the decision tree of the global approach may lead to explanations (decision rules and counterfactuals) more complex to understand than those returned by the proposed local approach LORE.

Comparing Neighborhood Generations

After concluding that “local is better than global”, we now show that our genetic programming approach improves over the following baselines in the generation of neighborhoods:

  • crn returns as the instances from (the test set) that are closest to ;

  • rnd augment the output of crn with additional randomly generated instances so that a stratified is obtained;

  • ris starting from the output of rnd performs the instance selection procedure111111 CNN (DBLP:journals/air/Olvera-LopezCTK10);

  • ros starting from the output of rnd performs a random oversampling to balance the decision outcomes in .

lore .962 .19 .993 .04 .959 .19 .588 .42 .756 .40
crd .924 .26 .855 .23 .894 .25 .349 .26 .583 .48
rnd .946 .22 .904 .15 .920 .22 .494 .24 .712 .40
ris .916 .27 .869 .05 .870 .26 .501 .22 .708 .39
ros .968 .17 .965 .03 .953 .17 .491 .22 .733 .34
Table 3. Comparison of neighborhood generations methods.

Table 3 reports the aggregated evaluation measures over the various black boxes and datasets. LORE overtook the performance of all the other neighbors generators. Intuitively, this means that LORE’s genertic programming approach contributes more than the other methods in capturing/explaining the behavior of the black box, both for direct and counterfactual decisions. Such a conclusion is reinforced by Figure 6, which shows the box plots of the distributions of , and , and some summary data on the size of decision trees (), of decision rule premises (), and of the number of falsified split conditions in counterfactual rules (). LORE has the highest mean and median f1-measures (high mimic of the black box), the smallest interquartile ranges (low variability of results), and the lowest complexity sizes. Only for LORE has the largest variability, but a median value that is higher than the percentile of the competitors.

5.5. Comparison with the State-of-Art

In this section we compare our approach with the state of the art.

5.5.1. Rules vs Linear Regression for Explanations

We present first a quantitative and qualitative comparison with the linear explanations of LIME121212 (ribeiro2016should). A first crucial difference is that in LIME, the number of features composing an explanation is an input parameter that must be specified by the user. LORE, instead, automatically provides the user with an explanation including only the features useful to justify the black box decision. This is a clear improvement over LIME. In experiments, unless otherwise stated, we vary the number of features of LIME explanations from two to ten and we consider the performance with the highest score.

Dataset german compass adult
Black Box lore lime lore lime lore lime
RF .925 .2 .880 .3 .941 .2 .826 .4 .901 .3 .824 .4
NN .980 .1 1.00 .0 .987 .1 .902 .3 .918 .3 .998 .1
SVM 1.00 .0 .966 .1 .997 .1 .900 .3 .985 .1 .987 .1
Table 4. LORE vs LIME: scores.
Figure 7. LORE vs LIME: box plots of and . Numbers on top are the mean values.
Quantitative Comparison

Table 4 reports the mean and standard deviation of for each black box predictor and dataset. Moreover, Figure 7 details the box plots of (top) and (bottom). Results show that LORE definitely outperforms LIME under various viewpoints. Regarding the score, even when LORE is worse than LIME, it has a score close to . For RF black box, instead, LIME performs considerably worse than LORE. The box plots show that, in addition, LORE has better (local) fidelity scores and is more robust than LIME, which, on the contrary, exhibits very high variability in the neighborhood of the instance to explain (i.e., for ). This can be tracked back to the genetic instance generation of LORE. Figure 8 reports a multidimensional scaling of the neighborhood of a sample instance generated by the two approaches. LORE computes a dense and compact neighborhood. The instances generated by LIME, instead, can be very distant from each other and always with a low density around .

Figure 8. Neighborhoods of LORE (left) and LIME (right).
Qualitative Comparison

We claim that the explanations provided by LORE are more abstract and comprehensible than the ones of LIME. Consider the example in Figure 9. The top part reports a LORE local explanation for an instance from the german dataset. The central part is a LIME explanation. Weights are associated to the categorical values in the instance to explain, and to continuous upper/lower bounds where the bounding values are taken from . Each weight tells the user how much the decision would have changed for different (resp., smaller/greater) values of a specific categorical (resp., continuous) feature. In the example, the weight has the following meaning (ribeiro2016should): if the duration in months had been higher than the value it is for , the prediction would have been, on average, 0.11 less “0” (or 0.11 more “1”). A not very easy logic to follow when compared to a single decision rule which characterize the contextual conditions for the decision of the black box. Another major advantage of our notion of explanation consists of the set of counterfactual rules. LIME provides a rough indication of where to look for a different decision: different categorical values or lower/higher continuous values of some feature. LORE’s counterfactual rules provide high-level and minimal-change contexts for reversing the outcome prediction of the black box.

- LORE r = ({credit_amount ¿ 836, housing = own, other_debtors =  none, credit_history = critical account} decision = 0) = { ({credit_amount 836, housing = own, other_debtors =  none, credit_history = critical account} decision = 1), ({credit_amount ¿ 836, housing = own, other_debtors =  none, credit_history = all paid back} decision = 1) }
- Anchor a = ({credit_history = critical account, duration_in_month [0, 18.00]} decision = 0)

Figure 9. Explanations of LORE, LIME and Anchor.

5.5.2. Rules vs Anchors for Explanations

A recent extension of LIME is the Anchor131313 approach (ribeiro2018anchors)

. It provides explanations in the form of decision rules, called anchors. Rules are computed by incrementally adding equality conditions in the premise, while an estimate of the rule precision is above a minimum threshold (set to 95%). Such an estimation relies on neighborhood generation through pure-exploration multi-armed bandit.

On a qualitative level of comparison, the Anchor approach requires the apriori discretization of continuous features, while the decision rule of LORE benefits of the capabilities of decision tree to split continuous features. Contrast, for instance, the example rules in Figure 9. Moreover, the approach of Anchor does not clearly extend to compute counterfactuals.

Let us compare now the two approaches on a quantitative level. Figure 10 reports the average precision of decision rules, where the precision of a rule is the fraction of instances in the neighborhood set that are correctly classified by the rule. Although LORE does not require to set the level of precision as parameter, the rule precision is on average high and very similar to that one obtained by Anchor, which is by construction at least 95%. This can be attributed to the performances of the decision tree induction algorithm, and of the instance generation procedure which produces balanced neighborhoods and . Figure 10 also shows the average coverage of decision rules, where the coverage of a rule is the fraction of instances to explain covered by the rule. As reported in (ribeiro2018anchors), large values of coverage are preferable, since this means that the set of decision rules produced over the instances to explain can be condensed/restricted to a subset of it. LORE shows a consistently better coverage than Anchor. Finally, we compare the stability of the two approaches with respect to randomness introduced in the neighboorhood generation. We measure stability using the Jaccard coefficient of feature sets used in the 10 decision rules computed for a same instances in 10 runs of the system. Table 5 reports mean and standard deviation of the Jaccard coefficient. LORE has a better stability than Anchor for all datasets and black boxes.

Figure 10. LORE vs Anchor: box plots of precision and coverage. Numbers on top are the mean values.
Dataset german compass adult
Black box lore anchor lore anchor lore anchor
RF .76 .15 .61 .15 .75 .12 .73 .14 .70 .15 .69 .15
NN .69 .18 .53 .21 .83 .13 .79 .16 .81 .12 .65 .16
SVM .82 .16 .32 .16 .71 .16 .70 .20 .87 .14 .67 .13
Table 5. LORE vs Anchor: Jaccard measure of stability.

6. Conclusion

We have proposed a local black box agnostic explanation approach based on logic rules. LORE builds an interpretable predictor for a given black box and instance to be explained. The local interpretable predictor, a decision tree, is trained on a dense set of artificial instances similar to the one to explain generated by a genetic algorithm. The decision tree enables the extraction of a local explanation, consisting of a single rule for the decision and a set of counterfactual rules for the reversed decision. An ample experimental evaluation of the proposed approach has demonstrated the effectiveness of the genetic neighborhood procedure that leads LORE to outperform the proposals in the state of the art. A number of extensions and additional experiments can be mentioned as future work. First, LORE now works tabular data. An interesting future research direction is to make the method suitable for image and text data, for example by applying a pre-processing step for extracting semantic tags/concepts that may be mapped to a tabular format. Second, another study might be focused on the possibility to derive a global description of the black box bottom-up by composing the local explanations and minimizing the size (complexity) of the global description. Third, research lab experiments would be useful for evaluating the human comprehensibility of the provided explanations. Finally, LORE explanations can be used for identifying data and/or algorithmic biases. After the local explanations are retrieved, it would be interesting to develop an approach for deriving an unbiased dataset for safely training the obscure classifier, or to prevent the black box from introducing an algorithmic bias.