MAGIX: Model Agnostic Globally Interpretable Explanations

06/22/2017 ∙ by Nikaash Puri, et al. ∙ adobe 0

Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, what is also important is understanding how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the generalization power of the rules it learned. We present here an approach that learns rules to explain globally the behavior of black box machine learning models. Collectively these rules represent the logic learned by the model and are hence useful for gaining insight into its behavior. We demonstrate the power of the approach on three publicly available data sets.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine Learning and Artificial Intelligence have a large number of applications today. These include but are not restricted to applications in medicine, finance, imaging, operations in e-commerce, audio and so on. In such a situation, it becomes more important than ever to understand and interpret these algorithms.

Broadly, when classified on the basis of complexity, there are two types of machine learning algorithms. The first are based on relatively straightforward formulations and are interpretable to a large extent. Under this category we have algorithms such as Linear Regression, Logistic Regression, Decision Trees and similar approaches. The main advantage of such algorithms is that they are simple and hence easy to interpret. For instance, by looking at the weights learned by a Linear Regression model, it is possible to determine the relative importance of different features used by the model. Hence, decisions made by such models are relatively easier to understand.

The second type of algorithms are based on more complex formulations that are able to represent non-linear functions and higher order feature interactions. Popular among these are Neural Networks, Random Forests, Gradient Boosted Trees and various types of ensembles. One of the key characteristics of these approaches is that they are able to model very complex patterns and hence achieve higher accuracy on most data sets than their simpler counterparts. However, the cost of this gain in accuracy is loss of model interpretability. Neural Networks, for instance often have several hidden layers with different activations and dropout. Random Forests can have thousands of trees, and the final decision is a function of the combination of the individual predictions made by these trees.

Model Interpretation is important in several industry verticals. It’s importance has been discussed in detail in [(Lipton, 2016), (Ribeiro et al., 2016a), (Doshi-Velez & Kim, 2017)] in the fields of medicine, finance, etc. However, an application that has not been covered in sufficient depth previously is that in Digital Marketing. Marketers spend a significant amount of money on personalization algorithms and platform to optimize content delivery. In such a scenario, it is important to have model interpretation techniques that can explain the behavior of the complex personalization algorithm to the marketer. Such a technique would increase the transparency and trust of the personalization platform. The approach we describe will be available to marketers as a part of a leading digital marketing suite of products.

We present here an approach that is global, model agnostic and that explains model behavior in an easy to understand way. Our approach learns rules that explain the behavior of a classification model. Each rule is independent, and is of the form If AND AND…. Then Predict Class . Here, refers to a specific condition such as ’’ and is some class in the data set. Our approach is similar in objective to [(Lakkaraju et al., 2017), (Bastani et al., 2017), (Kim & Seo, 2017)]. We demonstrate the results of our approach on data sets from medicine, finance and Digital Marketing.

2 Literature Survey

There has been a variety of work in the field of model interpretation. The types of approaches applied can be characterized along several dimensions. They are:

  1. The complexity of the response function that we want to explain. These include simple linear functions such as the ones learned by linear regression algorithms. Such functions are the most straightforward to interpret. The advantage of such functions is that it is easy to see what effect (both magnitude and direction) a particular attribute would have on the output variable. Non linear functions on the other hand involve complex combinations of the input features and are hence much harder to interpret. The advantage of representing model interpretations in the form of rule sets is that they can explain non-linear functions as well.

  2. Scope of interpretation. Some approaches allow global interpretation of machine learning models([(Lakkaraju et al., 2017), (Bastani et al., 2017), (Kim & Seo, 2017), (Shrikumar et al., 2016)]), while others focus on instance level or local explanations([(Ribeiro et al., 2016b), (Fong & Vedaldi, 2017), (Lundberg & Lee, 2017), (Robnik-Šikonja & Kononenko, 2008), (Kononenko et al., 2010), ]). By local explanations, we mean explaining model behavior in a limited region of the input space.

  3. Whether the approach depends on the model. There are two types of approaches under this category. The first are model specific, i.e. those approaches that are designed to exploit model specific properties in constructing explanations[(Foerster et al., 2017), (Selvaraju et al., 2016), (Goyal et al., 2016), (Sundararajan et al., 2017), (Setiono & Thong, 2004), (Shrikumar et al., 2016)]. The second are model agnostic, i.e. those approaches that do not leverage underlying details of the machine learning model in constructing explanations[(Lakkaraju et al., 2017), (Bastani et al., 2017), (Kim & Seo, 2017), (Koh & Liang, 2017), (Henelius et al., 2017), (Phillips et al., 2017), (Fong & Vedaldi, 2017)]. Rather, they treat the model as a black box and rely on the predict function of the model.

Techniques such as partial dependence plots, residual analysis and generalized additive models have been used to understand model behavior as described in (Patrick Hall & Ambati, 2017)

. These techniques are useful for models that learn monotonic response functions. However, complex models such as neural networks and gradient boosted trees learn non-monotonic response functions. In such functions, changes in the input variables in the same direction can lead to the response variable changing in different directions. For example, in a loan rejection model, the age increasing from 20 to 25 may decrease the risk of loan rejection, however the age increasing from 60 to 65 might produce the opposite effect and increase the risk of loan rejection. Interpreting the behavior of such models at a global level is a non-trivial problem.

(Tan et al., 2017) and (Tan et al., 2018) investigate how model distillation can be used to distill complex models into into models that are transparent or interpretable in some sense.

(Bastani et al., 2017) have used a surrogate model approach where they extract a decision tree that represents model behavior. The problem with this approach is that each path to a leaf node goes through the first attribute that the tree splits on. The result being that each rule includes a condition involving this attribute. This holds recursively such that the rules derived from the left subtree would have two common attributes and so on. Further, the tree itself can often be several levels deep, leading to complex paths that are not easy to understand. Hence, we have focused our work on output in the form of rule lists as described in (Letham et al., 2015) and (Lakkaraju et al., 2016). The latter describe a Joint Framework for Description and Prediction in the form of interpretable decision sets. They show that decision sets are more comprehensible to humans than decision lists because rules apply independently. The output of our approach is in the form of a decision set, where each rule applies independently of the others.

The global black box model interpretation approaches discussed in [(Lakkaraju et al., 2017), (Lakkaraju et al., 2016)] use the Apriori algorithm (Agrawal et al., 1996) to generate conditions. The problem with this approach is that it generates conditions that have high frequency in the data set and not necessarily those that are useful for distinguishing between classes. In our approach we have used the LIME algorithm proposed by (Ribeiro et al., 2016b) to learn conditions that are important to explain the classification of certain instances. Rules are learned by a genetic algorithm that tries different combinations of conditions to optimize a fitness function. Hence, our approach can be thought of as an extension to LIME that explains model behavior globally in the form of independent if then rules.

3 Problem Description

Our method takes as input a Data-set and a model trained using that Data-set.
The dataset , where each row consists of an instance having fixed columns and a class label , the set of classes.
The classification model has a function predict-proba() that assigns a

-vector of probabilities to each

. Each element in the vector represents the probability that the instance belongs to the corresponding class.
The algorithm outputs the set of rules for each class . Each rule is a conjunction of conditions and a condition is a set of constraints on values of one feature column.

4 Notations and Definitions

Please refer to Table 1 for a reference of the notations and definitions used in the paper.

Symbol Definition Example
X training data, consisting of instances [(1, John, 25), (2, Jane, 30) …]
Y set of predicted classes [approve, reject]
Particular instance in the training set (1, John, 25)
Particular predicted class approve
Condition, which is a combination of an attribute and a range of values the attribute can take age<30, state{New York, Florida}
A classification rule which classifies an instance to belong to its associated class if a predicate consisting of a conjunction(AND) of conditions holds. IF age<30 and salary>100 and state=New York Then Predict Class approve
Set of conditions for rule {age<30, salary>100, state=New York}
Associated target class for rule Predict Class: approve
cover() Set of instances rule covers in the training data. These are instances for which the conditions of are true. {}
correct-cover() Set of instances for which the rule’s class prediction agrees with the model’s prediction. Formally, it is the set cover() and where is the class predicted by the classifier for the instance {}
incorrect-cover() Set of instances that are incorrectly covered by {}
R Rule set. Consists of a set of rules of the form . {, , …}
correct-cover(R) Correct cover of rule set R. It is defined as the union of the correct covers of for each rule in R {}
cover(R) Cover of rule set R. It is defined as the union of the covers of for each rule in R {}
Table 1: Notations and Definitions Used
Definition 4.1.

Rule Precision: The precision of is the ratio of number of instances in correct-cover() to the number of instances in cover().

Definition 4.2.

Rule Length: The length of is the cardinality of the precondition set for the rule .

Definition 4.3.

Rule Class Coverage: The class coverage of is the ratio of number of instances in correct-cover() to the number of instances in the training set that have been predicted by the classifier to have label .

Definition 4.4.

Rule Mutual-Information (RMI): The Rule Mutual-Information or RMI of a rule captures its mutual dependence with the class predicted. It is derived using the Mutual Information(

) of the contingency table

2 as follows -

Rule/Class Predicted-class Other class(es)
= Cardinality of correct-cover() = Cardinality of incorrect-cover()
NOT = Count of instances with same label as the class predicted but not covered by = Count of instances not included in and with label different from the class predicted by
Table 2: Contingency table used to compute RMI of rule
Definition 4.5.

Rule Set Precision: The precision of R is the ratio of number of instances in correct-cover(R) to the number of instances in cover(R).

Definition 4.6.

Rule Set Class Coverage: The class coverage of R is the ratio of number of instances in correct-cover(R) to the number of instances in the training set that have been predicted by the classifier to have label where is the class that the rules in R predict.

5 Approach

Our approach is outlined in Algorithm 1 and the subsequent sections explain each step in more detail.

6:  for  in Y do
11:  end for
12:  return R

5.1 Preprocessing the Input Data

In this step we pre-process the input data to make each feature categorical. Concretely, for categorical features we leave them unmodified. For numerical features, we perform entropy based binning to split the feature values into discrete bins [(Ribeiro et al., 2016b)]. For instance, if the attribute age takes values from 10 to 85 (inclusive), then our binning step could produce the ranges ’’, ’’ and ’’. Post this step, the input data comprises only of categorical features. Further, we split the original data set into training and testing data sets.

These preprocessing steps correspond to the PreProcessInputData(X, Y) function in Algorithm 1.

Figure 1: Framework to get rules for one class

5.2 Generating Instance Level Conditions

The algorithm iterates over each instance in the training data. For each instance there is a particular class that the instance has been classified into. We compute the marginal contribution of each feature-value used by the model in arriving at this classification. We have used the approach described by (Ribeiro et al., 2016b) to compute instance level marginal contributions. This corresponds to the procedure Marg(, M) in Algorithm 2. The approach perturbs the training instance in question, and trains a locally faithful linear model in the locality of the instance under consideration. The weights of the different features then approximate the marginal contribution values. The output of this step is a list of conditions. Each condition consists of a single feature and a value. The complete algorithm is outlined in Algorithm 2. This corresponds to the GenInstConds function in Algorithm 1.

5:  for  in X do
7:     for  in instance-level-conditions do
9:     end for
11:     if  then
12:        break;
13:     end if
14:  end for
15:  return conditions;
Algorithm 2 GEN-INST-CONDS

5.3 Learning Rules From Conditions

The output of the previous step of generating instance level conditions is a list of conditions that are important at the instance level. For each class, we have a set of conditions that were important in classifying instances of that class. We want to learn rules for that class. Each rule has an associated coverage and precision as defined in definitions 4.3 and 4.1 respectively. Hence, the problem is the following. How can we build rules having high precision and coverage, from these base conditions? This could be done in a combinatorially prohibitive manner. For instance, we could first evaluate all rules of length 1, then length 2 and so on. However, if the output of the previous step gives N conditions, then the complexity of this algorithm would be . This is not computationally feasible for larger data sets.

We need an approach that can learn optimal combinations of conditions. And we want the notion of optimality to be abstracted away from the particular approach. Concretely, we want an algorithm that can take as input a notion of optimality and subsequently combine conditions to generate rules that are optimal. Another point that needs to be considered is that if there are any categorical variables in the data then we want the rules to allow these categorical variables to take more than one value. For example, if we had a categorical variable called country and if ’

’, ’’, and ’’ are conditions for class 2, then one candidate rule could be ’If AND THEN Predict Class 2’. Therefore, the combinations that we need are not only ’AND’s of conditions but also ’OR’s within a condition involving a categorical variable so that it allows such a variable to take on multiple possible values.

Similar to the work done in (Fidelis et al., 2000) and (Ghosh et al., 2010), we use a Genetic Algorithm to learn such rules under the given conditions. The algorithm is run independently for each class. Hence, we are trying to learn class level rules from class level conditions. Each individual of the Genetic Algorithm represents a possible rule. For example, if, the number of conditions generated in the previous step (learning instance level conditions) was 100, then, each individual of our population would be a bit string of length 100. One example string might be ’100100000….000’. The example individual described represents a rule of the form ’If condition1 AND condition4 THEN predict Class 1’. Each condition that involves a categorical variable has been allowed to take more than one value. The fitness function should capture several requirements:

  1. Precision and Coverage: Rule Mutual-Information (RMI) as defined in definition 4.4, captures this notion of simultaneously trying to optimize for both precision and coverage. The rules having a high RMI are neither highly precise with low coverage nor overly generic and imprecise with high coverage.

  2. Length: We want the fitness function to add a length factor. The intuition here is that shorter rules are easier to interpret. And while it is true that rule length often inversely correlates with coverage, however we find that making this parameter explicit leads to shorter, more interpretable rules.

  3. Overlap: It may happen that an instance is covered by one rule that says it should belong to Class 1 and another rule that says it should belong to Class 2. Clearly, only one of these is correct and we want to minimize the amount of ambiguity in our final rule set. Since we plan to go for rules that have high precision, meaning that they were correctly copying model behavior, the amount of overlap in the final rule set will be minimized as we optimize precision.

Keeping all these in mind, we designed the following fitness function for our genetic algorithm:

Here, ind stands for individual represented by a bit string, is the rule that this individual represents and N is the length of the bit string. A weight can be given to either of these terms as per the requirement of the user. We take our population to be 1200 individuals with a cross-over probability of 50% and mutation probability set in such a manner that on average 2 bits of an individual are flipped whilst undergoing mutation. This gives a reasonable trade-off between exploration and exploitation. We initialized the population with individuals that have a high probability of being ’fit’. These are individuals with only one bit set in the bit string (such as, 1000, 0100, 0010 and 0001), followed by those with only two bits set (1100, 0110, 0101…) and so on and so forth until the entire population size i.e. 1200 individuals, is reached. We run the algorithm for 600 generations. All the individuals of the last generation are finally selected as our rule set for one class.
Hence, the output of this step is an exhaustive rule set for each class. There will be several rules that do not add value to the set and need to be filtered out.

This step corresponds to the LrnCls function in Algorithm 1.

5.4 Post-processing Rules

The Rules generated in the previous step include several redundant rules. To eliminate them, we sort the generated rules in descending order of precision. Then, for each rule , we check whether correct-cover() correct-cover() and Precision()Precision(), where is a more precise rule not yet removed. Then we do not consider for the next step. Otherwise, we retain this rule for consideration.

We want to avoid getting spurious rules that do not represent the patterns that the black box model learnt. Such rules are possible due to Texas sharpshooter fallacy[(Popik, 2013)] and do not describe model’s prediction on unseen data. The original data set is split into training and testing subsets. The rules are learned by observing model behavior on the training data set. Then, for each rule, the precision on the testing data set is calculated. If the precision is less than the baseline precision on the test data, the rule is discarded at this step. This helps by removing noisy rules that have a positive RMI in the training data set but low precision in the test data set. Baseline precision is the fraction of the instances in the test set for which the model predicts , the class associated with the rule .

The corresponding procedure in Algorithm 1 is Proc(R).

5.5 Sorting rules by Mutual Information

The previous section gives us a combination of ’OR’s (a subset of rules) among the ’AND’s (combining clauses into rules) generated by the Genetic Algorithm. Once we have the optimal set of rules for a class, we want to sort them so as to provide the user with the most relevant rules at the top and less relevant ones further down the list. This is done as follows. First, the rules are sorted in descending order of RMI. Then, the rule with the highest RMI is selected, and added it to the top of the list. Then, the rule with the next highest value of RMI is selected. The rule is compared to the list of already added rules. If it is similar to any of the rules already added, we discard this rule. Otherwise, it is added to the list of rules. The similarity is computed using the Jaccard Similarity measure between the list of instances covered by the two rules. If the Jaccard Similarity is greater than 50%, the rules are deemed to be similar. This process is repeated till we have a desired number of rules.

The corresponding procedure in Algorithm 1 is Sort(R).

6 Results

We demonstrate our approach on four publicly available data sets and explain the results across various measures and dimensions to draw conclusions. These are the ’Iris’ ((Fisher, 1936)) data set, the ’Wisconsin Breast Cancer data set’((Lichman, 2013)), the ’Banknote authentication data set’((Lichman, 2013)) and the ’Car Evaluation Dataset’((Lichman, 2013)). We train a random forest classifier (with 500 trees, min-samples-split=2, min-samples-leaf=1) on each dataset and interpret the models as a decision set of independent rules using algorithm 1. Table 3 shows a sample of the rules obtained.

Dataset Rule Prec. (%) Cov. (%)
Iris petal-width <= 0.80 THEN Predict class: Iris-setosa 100 100
Iris petal-width >1.85 THEN Predict class: Iris-virginica 100 62
Iris 2.60 <petal-length <= 4.45 THEN Predict class: Iris-versicolor 100 62
Breast Cancer 0.50 <bare_nuclei <= 1.50 AND mitoses <= 1.50 THEN Predict class: 2 98 83
Breast Cancer bare_nuclei >8.50 THEN Predict class: 4 97 56
Breast Cancer clump_thickness >8.50 THEN Predict class: 4 100 33
Banknote variance >2.39 THEN Predict class: 0 100 52
Banknote skewness >9.62 THEN Predict class: 0 100 16
Banknote -2.80 <variance <= -0.40 AND -5.87 <entropy <= 1.00 THEN Predict class: 1 83 43
Banknote -6.98 <skewness <= -5.51 THEN Predict class: 1 84 11
Cars safety = low THEN Predict class: unacc 100 49
Cars persons = more,4 AND safety = high AND buying = med THEN Predict class: acc 70 17
Cars persons = 2 THEN Predict class: unacc 100 46
Cars safety = high AND lug-boot = small AND persons = 4 AND buying = med,low AND maint = low,med THEN Predict class: good 75 20
Digital Marketing CUMULATIVE-ACTION <= 0.50 AND <= 300.00 THEN Predict class: 1 100 87.4
Digital Marketing ENV-OperatingSystemVersion = Linux AND = 1 THEN Predict class: 2 51.1 46
Digital Marketing ENV-OperatingSystemVersion = Linux AND 597.50 <ENV-BrowserWidth <= 607.50 AND 567.00 <ENV-ScreenWidth <= 620.50 THEN Predict class: 2 84.6 22
Table 3: Sample of Results on Datasets

6.1 Iris Dataset

The data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica). It is a classification data set, having four features which are Sepal Length, Sepal Width, Petal Length and Petal Width. The three classes are Setosa, Versicolour, and Virginica. The Random Forest classifier accuracy was close to 98%.

Table 3 shows a sample of the rules extracted from a Random Forest model trained on this dataset using Algorithm 1. As an example consider the rule ’IF petal-width (cm) = 0.80 THEN Predict class: Iris-setosa’. It has a precision of 100%. This means that whenever the classifier saw a flower having petal width less than 0.80cm, it classified it into the class Setosa. A domain expert could now determine whether such a rule is reasonable in the real world, or is something that inadvertently crept into the data set and hence not likely to hold for real world applications. Further, a data scientist could analyze such rules to evaluate the extent to which they hold in the data set.

6.2 Wisconsin Breast Cancer Dataset

This dataset was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg (Lichman, 2013). The Random Forest classifier accuracy was close to 98%.

The rules obtained for this data set offer some insight into both the workings of the classifier and the original data set. We discuss a couple of rules here. As we can see from Table 3, the rule ’IF clump-thickness greater than 8.50 THEN Predict class=4(malignant)’ is a high precision rule. At this point a domain expert could examine this and similar rules to gain an understanding of patterns in the training data set. Further, if such rules are due to spurious introductions into the training data and not due to genuine patterns, then the model can be retrained with a more diverse data set to increase its generalization power. By looking at only instance level justifications, it is not easy to understand the general patterns that the model derived from the data. Hence, the aggregated view of model behavior that our approach produces is useful to understand how the model works.

6.3 Banknote Authentication Dataset

This dataset was extracted from images taken from genuine and forged banknote-like specimens (Lichman, 2013). Wavelet Transform tool was used to extract features from these images. The dataset has 1372 instances, described by 4 attributes. The Random Forest classifier accuracy was close to 99%. Table 3 shows a sample of the rules obtained from the classifier.

6.4 Car Evaluation Dataset

Each instance of this dataset describes a car (Lichman, 2013). This dataset has 1728 instances, described by 6 attributes: buying cost, maintenance cost, number of doors, number of persons, size of lug boot, safety indication. All of the attributes are categorical taking values like high, medium, low and etc. The target variable indicates whether a particular instance i.e. a car is unacceptable, acceptable, good or very good. We trained a Random Forest classifier with 400 trees on the data set. The classifier accuracy was close to 96%. The rules obtained are shown in Table 3. For instance, the rule ’safety=low THEN Predict class: unacceptable’ has 100% precision and makes sense intuitively.

6.5 Digital Marketing Dataset

We have also tested our approach on several real life marketing datasets from a personalization platform of a large digital marketing company. The platform learns from user behavior and shows the right content to the right user with the objective of maximizing the user’s likelihood of conversion. Conversion could correspond to clicking an ad, buying a product, etc.. The platform is driven by machine learning models such as Random Forests. With our approach, the decisioning logic of the black box models are made transparent and this leads to increased trust and provides insights to the marketers.

The data set from the platform consists of a set of users with their profiles. For each user, we have the piece of content that the platform decided to show that user. Further, we have a serialized form of the model used by the platform to perform the decisioning. Table 3 shows a sample of the rules obtained. A set of such rules would help the marketer better understand the behavior of the digital marketing platform.

Further, we have created another data set to help the marketer understand user behavior. The data set consists of a set of users with their profiles and whether they converted. It is important to state that we record whether the user converted regardless of the piece of content for which they converted. A model is then trained on this data set. Finally, we use MAGIX to interpret the model. The rules so obtained describe user segments that have either a high or a low propensity to convert. This could be useful for the marketer to understand how his end users behaved.

We have validated the usefulness of the approach with several customers. Marketers will be able to login to the platform and view both types of reports (model explanation report and user behavior report).

6.6 Validation

After completion of the rule processing step, we have a set of rules for each class. The rules have been selected in a manner that optimizes for coverage and precision simultaneously. These metrics are useful for evaluating how well a single rule expresses model behavior. However, we need an approach for quantifying how well a collection of rules represents the model. In order to achieve this, we used the rules, discovered by our method, to build a predictive model. The logic for the model is as follows. If an instance is covered by more than one rule, then it is classified according to the most precise rule. Ties are settled randomly. Then, we use both the original model and our rule based model to make predictions on the testing data. Finally, we compare the predictions made by our rule based model to those made by the original model and compute the fraction of predictions that match. We call this the Imitation@K metric. Here, K refers to the number of rules that the rule set contains. This is to ensure that a scheme does not obtain very high imitation accuracy by including a very large number of high precision and low coverage rules. Approaches that have a number of very high precision, low coverage rules would cover a fewer number of instances and hence receive a low score in the metric. Similarly, approaches that have high coverage, low precision rules would make a large number of mistakes and hence receive a low score in the metric. To achieve a good score, the set of rules must cover a large fraction of instances with a high precision.

Dataset Im@1 Im@2 Im@5 Im@10 Im@20
Iris 33.33 66.66 86.66 96.66 96.66
Breast Cancer 56.42 58.57 63.57 93.57 95.71
Banknote Authentication 32.00 50.90 71.63 94.90 96.00
Car Evaluation 24.56 53.75 86.12 90.17 89.01
Table 4: Validation Results for different values of K

Table 4 shows the validation accuracies (in percentages) for the different data sets for different values of K. As can be seen from the table, our approach generates rules that are able to imitate model behavior for a large fraction of the data set. This indicates that the set of rules is able to capture to a large extent the behavior of the model. It is also important to note that as we increase the value of K, it is possible that imitation accuracy goes down. This is because the addition of rules could cause certain instances to be classified incorrectly by the rule based proxy model.

7 Conclusion

We have presented here an approach that explain a black box model globally. Our experiments on different data sets demonstrate the usefulness of our approach. The potential applications of our approach are diverse. These include applications in medicine, finance and digital marketing platform solutions. Further, by understanding models trained on large data sets, we can extract patterns inherent in the original data. This gives us a useful way to understand large and complex data sets. We have also introduced the Imitation@K metric that could be useful for comparing decision rule sets generated through different approaches.

8 Limitations of MAGIX

Our approach only works with classification models. This is because the decision rule set may not be a good interpretable representation for a regression model, where we would like to present the output also as a function of features. Further, we have not applied the approach to image classification problems. While it is possible to use (Ribeiro et al., 2016a) to derive image regions that are important for classification at the instance level, it is not clear how to aggregate such regions to create global rules that are applicable for the entire training dataset.

9 Future Work

The current version of the approach is based on classification problems. One possible direction of future research is to extend our approach to explaining models that predict continuous variables. Further, it would be useful to extend our approach to explain models that work with images. As model interpretation approaches become more prevalent, another important direction of research is to devise metrics that can compare two model interpretation approaches. We have proposed the Imitation@K metric. However, this is only useful for comparing interpretation approaches that output a set of rules. A more general metric that can compare any two interpretation approaches would be useful.