Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach

10/21/2019 ∙ by Homayun Afrabandpey, et al. ∙ 52

A salient approach to interpretable machine learning is to restrict modeling to simple and hence understandable models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users' preferences, not the data generation mechanism: it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model compromising no accuracy, is constructed and fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic - neither the interpretable model nor the reference model are restricted to be from a certain class of models - and the optimization problem can be solved using standard tools in the chosen model family. Through experiments on real-word data sets using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the earlier alternative of restricting the prior. We also propose a systematic way to measure stabilities of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Background

Lack of interpretability remains a key barrier to the adoption of machine learning (ML) approaches in many applications. To bridge this gap, there is a growing interest among the machine learning community to ML interpretability methods, i.e. methods to make ML models understandable. Despite the large body of literature on interpretable ML (see Doshi-Velez and Kim [5], Du et al. [6], Murdoch et al. [20], and Weld and Bansal [29]), there has been little work on interpretability in the context of the Bayesian framework. Wang et al. [28] presented two probabilistic models for interpretable classification by constructing rule sets in the form of Disjunctive Normal Forms (DNFs). In this work, interpretability obtains by tweaking the prior distributions to favor rule sets with a smaller number of short rules. This is achieved by allowing the decision maker to set the parameters of the prior distributions over the number and length of rules to encourage the model to have a desired size and shape. Letham et al. [18]

obtained an interpretable classifier by using decision lists consisting of a series of

if … then … statements. Interpretability factors are (i) number of rules in the list and (ii) size of the rules (number of statements in the left-hand side of rules). A prior distribution is defined over rule lists that favors decision lists with small number of short rules. Popkes et al. [23]

proposed an interpretable Bayesian neural network architecture for clinical decision making tasks where interpretability is obtained by employing a sparsity inducing prior over feature weights. In a different but relevant scenario, Kim et al. 


proposed an interactive approach with the goal to obtain from among a set of equally good clusterings, the one that best aligns with a user’s preferences. User feedback affects the prior probability of prototypes being in a particular cluster (and therefore affects the clustering) either directly or indirectly, depending on the confidence level of the user feedback. In

[27], the authors present a multi-value rule set for interpretable classification that allows multiple values in a condition and therefore induces more concise rules compared to the single-value rules. Same as the work of [28], interpretability is characterized by a prior distribution that favors smaller number of short rules.

In summary, a common practice for making ML models interpretable using the Bayesian framework is to fuse interpretability with the prior distribution such that in the inference, interpretable models become more favorable [30, 11, 12]. In the following, we call this approach interpretabiliy prior. We argue that this is not the best way of optimizing for interpretability for the following reasons:

  • Interpretability is about users’ preferences, not about our assumptions about the data. The prior is meant for the latter. One should distinguish the data generation mechanism from the decision making process of interpretability optimization.

  • Optimizing interpretability naturally sacrifices some of the accuracy of the model. If interpretability is pursued by changing the prior, there is no reason why the trade-off between accuracy and interpretability would be optimal. This has been shown in a different but related scenario [22] where the authors showed that fitting a model using sparsity inducing priors that favor simpler models results in performance loss.

  • Formulating interpretability prior for certain classes of models such as neural networks could be difficult.

To solve these concerns, we propose to reserve the prior to assumptions on the data, and to include interpretability in the decision-making stage of how the model is used. This results in a two-step strategy to interpretability in the Bayesian framework. We first fit a highly accurate Bayesian predictive model, which we call reference model, to the training data without constraining it to be interpretable. In the second phase, we construct an interpretable proxy model that best describes locally and/or globally the behavior of the reference model. The proxy model is constructed by optimizing a utility function, referred to as interpretability utility, that consists of two terms: (I) a term to minimize the discrepancy of the proxy model from the reference model, and (II) a term to penalize the complexity of the model to make the proxy model as interpretable as possible. Term (I) corresponds to the reference predictive model selection idea in the Bayesian framework [26, Section 3.3]

. The proposed approach is model-agnostic meaning that neither the reference model nor the interpretable proxy are constrained to a particular class of models. We also emphasize that the proposed approach is feasible for non-Bayesian models as well, which can be interpreted to produce point estimates of the parameters of the model instead of posterior distributions.

Through experiments on real-world data sets using decision trees as interpretable proxies and Bayesian additive regression tree (BART) models [4] as reference models, we show that the proposed approach results in regression trees which are more accurate and more interpretable than the alternative of fitting an a-priori interpretable model to the data. We also show that this interpretability utility approach can construct more stable interpretable models.

1.1 Our Contributions:

Main contributions of the paper are:

  • We present how Bayesian reference predictive model selection can be combined with interpretability utilities to produce more interpretable models in decision theoretically correct way. The proposed approach is model agnostic and can be used with different notions of interpretability.

  • For the special case of classification and regression tree models [1] as interpretable models and BART model as the black-box Bayesian predictive model, we derive the formulation of the proposed approach and show that it outperforms the interpretability prior approach in accuracy, explicating the trade-off between explanation fidelity and interpretability.

  • We propose a systematic approach to compare stability of interpretable models.

2 Motivation for Interpretability Utility

In this section, we discuss the motivation for formulating interpretability optimization in the Bayesian framework as a utility function. We also discuss how this formulation allows accounting for model uncertainty in the explanation. Both discussions are accompanied with illustrative examples.

2.1 Interpretability as a Decision Making Problem

Bayesian modelling allows encoding prior information into the prior probability distribution (similarly, one might use regularization in maximum likelihood based inference). This may tempt to changing the prior distribution to favour models that are easier for humans to understand, using some measure of interpretability. A simple example is using shrinkage priors in linear regression to find a smaller set of practically important covariates. We argue, however, based on the observation that interpretability is not an inherent characteristic of data generating processes. The approach can be misguiding and results in leaking user preferences about interpretability into the model of the data generation process.

We suggest separating the construction of a model for the data-generating process from construction of an interpretable proxy model. In a prediction task, the former corresponds to building a model that predicts as well as possible, without considering its interpretability. Interpretability is introduced in the second stage by building an interpretable proxy to explain the behaviour of the predictive model. We consider the second step as a decision making problem, where the task is to choose a proxy model that trades off between human interpretability and fidelity (w.r.t. the original model).

2.2 The Issue with Interpretability in the Prior

Let denote the assumptions about the data generating process and the preferences toward interpretability. Consider an observation model for data , , and alternative prior distributions and . Here, can, for example, be continuous model parameters (e.g., weights in a regression or classification model) or it can index a set of alternative models (e.g., each configuration of could correspond to using some subset of input variables in a predictive model). Clearly, the posterior distributions and

(and their corresponding posterior predictive distribution) are in general different and the latter includes a bias towards interpretable models. In particular, when

does not correspond to prior information about the data generation process, there is no guarantee that provides a reasonable quantification of our knowledge of given the observations or that provides good predictions. We will give an example of this below. In the special case where does describe the data generation process, it can directly be included in .

For example, Lage et al [17] propose to find interpretable models in two steps: (1) fit a set of models to data and take ones that give high enough predictive accuracy, (2) build a prior over these models, based on an indirect measure of user interpretability (human interpretability score). It is not obvious that this leads to a good trade-off between accuracy and interpretability: in practice, it requires choosing the set of models for step 1 to contain interpretable models, mixing knowledge about the data generation process with preferences for interpretability.

Figure 1: Illustrative example: The reference model (green) is a highly predictive non-interpretable model that approximates the true function (black) well. The interpretable model fitted to the reference model (magenta) approximates the reference model (and consequently the true function) well, while the interpretable model fitted to the training data (blue) fails to approximate the predictive behavior of the true function.

2.2.1 Illustrative Example

We give an illustrative example, in a case where the assumptions in the interpretable model do not match with the data generating process, to demonstrate the difference between (1) fitting an interpretable model directly to the training data (the interpretability prior approach), and (2) the two-stage fitting process of first fitting a reference model and then approximating it with an interpretable model (the proposed interpretability utility approach). For simplicity of visualization, we use a one-dimensional smooth function as the data-generating process, with Gaussian noise added to observations (Figure 1:left, black curve and red dots). As an interpretable model, a regression tree is fitted with a fixed depth of (Figure 1:left, blue). Being a piece-wise constant function, it doesn’t correspond to true prior knowledge about the ground-truth function. A Gaussian process with the MLP kernel function is used as a reference model (Figure 1:left, magenta).

The regression tree (of depth ) fitted directly to the data (blue line) overfits and doesn’t give an accurate representation of the underlying data generation process (black line). The interpretability utililty approach, on the other hand, gives a clearly better representation of the smooth, increasing function, as the reference model (green line) captures the smoothness of the underlying data generation process and this is transferred to the regression tree (magenta line). The choice of the complexity of the interpretable model is also easier, because the tree can only “overfit” to the reference model, meaning that it only becomes a more accurate (but possibly less easy to interpret) representation of the reference model. In particular, the trade-off is between the interpretability and fidelity with regard to the reference model, but not the original training data, making the choice of complexity of the interpretable model significantly easier. Figure 1:right shows the root mean squared errors compared to the true underlying function as the tree depth is varied.

2.3 Interpreting Uncertainty

In many applications, such as treatment effectiveness prediction, knowing the uncertainty in the prediction is important. In Bayesian modelling, quantifying uncertainty is fundamental. Any explanation of the predictive model should also provide insight about the uncertainties and their sources. We demonstrate with an example that the proposed method can provide useful information about model uncertainty.

2.3.1 Practical Example

Figure 2:

Mean explanation, explanation variance, and three sample explanations for a convolutional neural network 3-vs-8 MNIST-digit classifier early in the training and fully trained. Colored pixels show linear explanation model weights, with red being positive for 3 and blue for 8.

We provide an example of visualizing uncertainty with our proposed method in locally explaining Bayesian deep convolutional neural network predictions in the MNIST dataset of images of digits, with the task of classifying between 3s and 8s. We use the Bernoulli dropout method [10, 9]

, with a dropout probability of 0.2 and 20 Monte Carlo samples at test time, to approximate Bayesian neural network inference. Logistic regression is used as the interpretable model family

111The optimization of the interpretable model follows the general framework explained later, with logistic regression used as the interpretable model family instead of CART. No penalty for complexity was used here, since the logistic regression model weights are easy to visualize as pseudo-colored pixels.

Figure 2 shows visually the explanation model weights (mean, variance, and three samples out of the 20, explained using the linear model) for a digit, comparing the model in an early training phase (upper row) and fully trained (lower row). The mean explanations show that the fully trained model has spatially smooth contributions to the class probability, while the model in early training is noisy. Moreover, being able to look at the explanations of individual posterior predictive samples shows, for example, that the model in early training has not yet been able to confidently assign the upper loop to either indicate a 3 or an 8 (samples 1 and 2 have reddish loop, while sample 3 has bluish). Indeed, the variance plot shows that the model variance spreads evenly over the digit. On the other hand, the fully trained model has little uncertainty about which parts of the digit indicate a 3 or an 8, with most model uncertainty being about the magnitude of the contributions.

3 Method: Interpretability Utility for Bayesian Predictive Models

Let denote a training set of size , where is a

-dimensional feature vector and

is the target variable. To construct an interpretable model using the Bayesian framework, we propose to separate the model construction and interpretability optimization. The idea is to first fit a highly predictive (reference) model without concerning the interpretability of the fitted model. In the second phase, by optimizing a utility function which we call interpretability utility, we find an interpretable model that best explains the behavior of the reference model locally or globally.

Denote the likelihood of the reference model by and the posterior distribution . For optimizing the interpretability, we introduce an interpretable model family with likelihood denoted by belongs to a probabilistic model family with parameters . The best interpretable model is the one that is the closest one to the reference model prediction wise, and at the same time easily interpretable. Assuming we want to locally interpret the reference model, such explainable model can be found by optimizing the following utility function:


where KL denotes the Kullback-Leibler divergence,

is the penalty function for the complexity of the interpretable model, and is a probability distribution defining the local neighborhood around , the data point which prediction is to be explained. The minimization of the KL divergence ensures that the interpretable model has similar predictive performance to the reference model while the complexity penalty guarantees the interpretability of the model.

We compute the expectation in Eq. 1 with Monte Carlo approximation by drawing samples from :


for posterior draws from . Eq. 2 can be solved by first drawing a sample from the posterior of the reference model, and then finding a sample from the posterior of the interpretable model. It has been shown [22] that minimization of the KL-divergence in Eq. 2 is equivalent to maximizing the expected log-likelihood of the interpretable model over the likelihood obtained by a posterior draw from the reference model:


Using this equivalent form and by adding the complexity penalty term, the interpretability utility becomes:


Choice of the form of the complexity penalty term depends on the class of interpretable models; possible options are the number of leaf nodes for the class of decision trees, number of rules and/or size of the rules for the class of rule list models, number of non-zero weights for linear regression models, etc. Although the proposed approach is general and can be used for any family of interpretable models, in the following, we use the class of Classification and Regression Tree (CART) models [1] with the tree size (the number of leaf nodes) as the measure of interpretability. With this assumption, the interpretability prior is over the model space; it could also be defined over the parameter space of a particular model, such as tree shape parameters of Bayesian CART models [3].

A CART model describes with two main components : a binary tree with terminal nodes and a parameter that associates the parameter value with the th terminal node. If lies in the region corresponding to the th terminal node, then has distribution , where denotes a parametric probability distribution with parameter . For CART models, it is typically assumed that, conditionally on , values within a terminal node are independently and identically distributed, and values across terminal nodes are independent. In this case, the corresponding likelihood of the interpretable model for the th draw from the posterior of has the form


where denotes the set of observations assigned to the partition generated by the th terminal node with parameter , and is the matrix of all . For regression problems, assuming a mean-shift normal model for 222For mean-variance shift model, each terminal node has its own variable and the number of parameters is ., we have the following likelihood for the interpretable model333For classification problems, the likelihood follows a categorical distribution.:


With this formulation, the task of making a reference model interpretable becomes finding a tree structure with parameters such that its predictive performance is as close as possible to , while being as interpretable as possible as measured by the complexity term .

With the normal likelihood defined for the terminal nodes, the log-likelihood of the th partition generated by the th terminal node is444For simplicity, for the rest of the paper we drop the index from the parameters emphasizing that for each draw of , a corresponding will be computed

and the log-likelihood of the tree for the samples drawn from the neighborhood of is


Projecting this into Eq. 4, the interpretability utility has the following form:


where and are respectively the mean and variance of the reference model for the th sample in the th terminal node. is a function of the interpretability of the CART model. Here we set it to using as a regularization parameter. The pseudocode of the proposed approach is shown in Algorithm 1.

Input: training data
Output: a decision tree explaining the prediction of a new sample
fit the Bayesian predictive model to the training data without interpretability prior;
for each sample in the test set do
       draw from the neighborhood of defined by ;
       for each draw  do
             get the mean and variance of the Bayesian predictive distribution;
       end for
       fit a CART model to by optimizing Eq. 8
end for
Algorithm 1 Decision theoretic approach for ML interpretability in the Bayesian framework

When fitting a global interpretable model to the reference model, instead of drawing samples from , we will use training inputs with their corresponding output computed by the reference model as the target value.

The next subsection explains how to solve Eq. 8 for the CART model to obtain an interpretable model to interpret a reference model.

3.1 Optimization Approach

We optimize Eq. 8 by using the backward fitting idea which involves first growing a large tree and then pruning it back to obtain a smaller tree with better generalization. For this goal, we use the formulation of maximum likelihood regression tree (MLRT) [25].

3.1.1 Growing a large tree

Given the training data555Here, for local interpretation, training data refers to the samples (with their corresponding predictions made by the reference model) taken from the neighborhood distribution to fit the explainable model., MLRT automatically decides on the splitting variable and split point (a.k.a. pivot) using a greedy search algorithm that aims to maximize the log-likelihood of the tree by splitting the data in the current node into two parts: the left child node satisfying and the right child node satisfying . The procedure of growing the tree is as follows:

  • For each node , determine the maximum likelihood estimate of its mean parameter given observations associated with the node, and then compute the variance parameter of the tree:

    The log-likelihood score of the node is then given, up to a constant, by .

  • For each variable , determine the amount of increase to the log-likelihood of the node caused by a split as

    where and are the log-likelihood scores of the right and left child nodes of the parent node generated by the split on the variable , respectively.

  • For each variable , select the best split with largest increase to the log-likelihood.

  • Among the best splits, the one that causes the global maximum increase in the log-likelihood score will be selected as the global best split, , for the current node, i.e. .

  • Iterate steps 1 to 4 until reaching the stopping criteria.

In our implementation we used the minimum size of a terminal node (the number of samples lie in the region generated by the terminal node) as the stopping condition.

3.1.2 Pruning

We adopt the cost-complexity pruning using the following cost function:


where is the maximum likelihood estimate of the tree . Pruning is done iteratively; in each iteration , the internal node that minimizes is selected for pruning where refers to the cost of the decision tree with as terminal node, denote the cost of the full decision tree in iteration , and denotes the subtree with as its root. The output of the above procedure is a sequence of decision trees and a sequence of values. The best and its corresponding subtree are selected using -fold cross-validation.

3.2 Connection With Local Interpretable Model-agnostic Explanation (LIME)

LIME [24] is a local interpretation approach that works by fitting a sparse linear model to the predictive model’s response via sampling around the point being explained. Our proposed approach extends LIME to KL divergence based interpretation of Bayesian predictive models (although it can also be used for non-Bayesian probabilistic models as well) by combining the idea of LIME and projection predictive variable selection approach [22]. The approach is able to handle different types of predictions (continuous valued, class labels, counts,censored and truncated data, etc.) and explanations (local or global) as long as we can compute KL divergence between the predictive distributions of the original model and the explanation model. For a more detailed explanation of the connection check the preliminary work of [21].

4 Experiments

In this section, we compare the performance of the proposed method with the interpretability prior approach in three scenarios. First, we evaluate the ability of the two approaches to find a good trade-off between accuracy and interpretability when constructing global interpretations using CART models as the interpretable model family. Then, stability of models constructed by each approach is investigated. Finally, the approaches are compared in providing local explanation for each prediction. Our codes and data are available online at www.anonymous.com666The link will be available upon acceptance..

4.1 Global Interpretation

4.1.1 Data

We test our proposed approach on three datasets: body fat [14], baseball players [13], and bike rental [8]. Each data set is divided into training and test set containing and of samples, respectively. As black-box reference model, we fit a Bayesian Additive Regression Tree (BART) model [4]

to the training data using the BART package in R with all parameters set to the default values except the number of burn-in iterations (nskip) in the Markov Chain Monte Carlo (MCMC) sampling and the number of posterior draws (ndpost) which are set to

and , respectively. As the prediction of the BART model, we use the mean of the predictions of the posterior draws. CART models are used as the interpretable model family. The interpretability prior approach fits a CART model directly to the training data where the interpretability prior is on the fitted model, i.e. the CART model is simple to interpret. On the other hand, our approach fits the CART model to the reference model (the BART model), through interpretability utility optimization. The process was repeated for 50 runs with the dataset divided into training and test sets randomly.

(a) Body Fat
(b) Baseball Players
(c) Bike Rental
Figure 3: Top row: Comparison of interpretability prior and interpretabiliy utility approach in trading off between accuracy and interpretability when using CART as explainable models and BART as reference model. Bottom row

: Results of Bayesian t-test that shows the mean and

highest density interval of the distribution of difference of means.

4.1.2 Performance Analysis

Lundberg and Lee [19] suggested viewing an explanation model as a model itself. With this perspective, we quantitatively evaluate the explanation models as if they were models. In Figure 3, the top row compares the performance of the two approaches in trading off between accuracy and interpretability for different data sets. Using the number of leaf nodes in the CART models as the measure of interpretability, it can be seen that the most accurate models with any level of complexity (interpretability) are obtained with our proposed approach.

To test the significance of the differences in the results, we perform Bayes t-test [16]

. The approach works by building up a complete distributional information for the means and standard deviations of each group (for each tree size, we have two groups of

RMSE values for the interpretability prior approach and interpretability utility approach) and constructing a probability distribution over their differences using MCMC estimation. From the distributions of the differences of the means, the mean and the Highest Density Interval (HDI) (as the range were the actual difference of the two group is within credibility) for each data sets is shown in the bottom row of Figure 3 where refers to the mean values generated by the interpretability prior approach and refers to the means of the distribution generated for the interpretability utility approach. When the HDI does not include zero there is a credible difference between the two groups. As it is shown in the figure, for all three data sets, for highly interpretable models (highly inaccurate), the difference between the two approach is not significant (HDI contains zero). This is expected since by increasing the interpretability, the ability of the interpretable model to explain variability of the data or of the reference model decreases a lot and both approaches provide poor performance. However, by increasing the complexity to a reasonable level, we see that the differences of the two approaches become significant.

4.1.3 Stability Analysis

Interpretability Body Fat Baseball Bike
Table 1: Bootstrap instability values in the form of . Best values are bolded.

The goal of interpretable ML is to provide a comprehensive explanation of the prediction logic to the decision maker. However, perturbation in the data or new samples may affect the learned interpretable model and lead to a very different explanation. This instability can cause real problems for decision makers that need to take actions in critical situations. Therefore, it is important to evaluate stabilities of different interpretable ML approaches.

For this goal, using a bootstrapping procedure with iterations, we compute pairwise dissimilarities of the interpretable models obtained using each approach and report the mean and standard deviation of the dissimilarity values as their instability measure (smaller is better). We used the dissimilarity measure proposed in [2]. Assuming we are given two regression trees and , for each internal node , the similarity of the trees at node is computed by


where is the indicator that determines whether the feature used to grow node in is identical to the one used in () or not, and are pivots used to grow the node in and , respectively and is the range of values of feature . Finally, the dissimilarity of the two decision trees is computed as where are user specified weight value which we set to where is the number of terminal nodes. The reported values are averaged over 45 values ( bootstraping iterations result in pairs of explainable models).

Table 1 compares the two approaches over the three data sets used in this subsection. The explanation models constructed using our proposed approach are more stable in two data sets and in one of them both approaches perform equally well (the differences in the Body fat and Baseball data sets are not statistically significant).

4.2 Local Interpretation

Dataset LIME Interpretability Utility
Boston housing
Table 2: Comparison of the local fidelity of LIME and Interpertability utility when being used to explain predictions of BART. Best values are bolded.

We next demonstrate the ability of the proposed approach in locally interpreting the predictions of a Bayesian predictive model. Same as before, we used BART model777In this experiment, we set the number of trees to with nskip and ndpost set to and respectively, for faster run. as the black box model and CART as the interpretable model family. For the CART model, we set the maximum depth of the decision trees to to obtain more interpretable local explanations. We compare with LIME888We use the ‘lime’ package in R ( for the implementation. emphasizing that LIME is not an interpretability prior approach, however it is a commonly used baseline for local interpretation approaches. The comparison is done over 2 different data sets from the UCI repository [7]: Boston housing and automobile. Decision trees obtained by our approach to locally explain predictions of the BART model, used on average features for the Boston housing data set and

for Automobile data set. Therefore, to have a fair comparison, we set the feature selection approach of LIME to ridge regression and select the

features with highest absolute weights to be used in the explanation999MSEs of LIME with features are respectively and for Boston housing and Automobile data set.. We use the standard quantitative metric for local fidelity: where given a test data , refers to the prediction of the local interpretable model (fitted locally to the neighborhood of ) for and refers to the prediction of the black-box model for

. We used locally weighted square loss as the loss function with

where .

Each data set is divided into 90%/10% training/test split. For each test data, we draw samples from the neighborhood distribution. Table 2 shows the results where our approach produces more accurate local explanation for both data sets. Figure 4 shows as an example, a decision tree constructed by our proposed approach to locally explain the prediction of the BART model for the particular test data shown in the figure from Boston housing data set. It can be seen that using only two features, our proposed approach obtains good local fidelity while maintaining interpretability with a decision tree with only leaf nodes.

Figure 4: Example of a decision tree obtained by the interpretability utility approach to locally explain the prediction of the BART model ( is the mean of the predictions of the posterior draws) for the particular test data . Using only features, our approach predicts the output . LIME with features predicts the output to be , and with features, LIME prediction is .

5 Conclusion

We presented a novel approach to construct interpretable explanations in the Bayesian framework by formulating the task as optimizing a utility function instead of changing the priors. We first fit a Bayesian predictive model compromising no accuracy and then project the information in the predictive distribution of the model to an interpretable probabilistic model. This also allows accounting for model uncertainty in the explanations. We showed that the proposed approach outperforms the alternative approach of restricting the prior, in terms of accuracy, interpretability and stability.

6 Acknowledgments

This work was financially supported by the Academy of Finland (grants 294238, 319264 and 313195), by the Vilho, Yrjö and Kalle Väisälä Foundation of the Finnish Academy of Science and Letters, by the Foundation for Aalto University Science and Technology, and by the Finnish Foundation for Technology Promotion (Tekniikan Edistämissäätiö). We acknowledge the computational resources provided by the Aalto Science-IT Project.