Explaining a black-box using Deep Variational Information Bottleneck Approach

02/19/2019 ∙ by Seojin Bang, et al. ∙ Carnegie Mellon University 8

Briefness and comprehensiveness are necessary in order to give a lot of information concisely in explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, which may lead to redundant explanations. We propose a system-agnostic interpretable method that provides a brief but comprehensive explanation by adopting the inspiring information theoretic principle, information bottleneck principle. Using an information theoretic objective, VIBI selects instance-wise key features that are maximally compressed about an input (briefness), and informative about a decision made by a black-box on that input (comprehensive). The selected key features act as an information bottleneck that serves as a concise explanation for each black-box decision. We show that VIBI outperforms other interpretable machine learning methods in terms of both interpretability and fidelity evaluated by human and quantitative metrics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Interpretability is crucial in implementing and utilizing black-box decision systems such as deep learning models. Interpretation of a black-box system provides justification to accept or reject decisions suggested by the system or explains the logic behind the system. In recent years, the extensive use of deep learning black-box systems particularly in a critical area such as self-driving cars and medical decision-making have given rise to interpretable machine learning approaches that are proposed to explain black-box systems. In order to effectively justify a black-box decision, both briefness and comprehensiveness must be taken into account for providing sufficient information while avoiding redundancy. However, existing approaches lack in-depth consideration for and fail to find both brief but comprehensive explanation. In this paper, we quantify this idea from an information theoretic perspective: the explanation should be maximally informative about a black-box decision while compressive about a given input. Thus we propose a systemic framework for providing a brief but comprehensive explanation by adopting the inspiring information theoretic principle,

information bottleneck principle (Tishby et al., 2000). We learn a stochastic boolean encoding that indicates instance-specific key features that serve as brief but comprehensive explanations for each black-box decision using a variational approximation to the information bottleneck objective.

The information bottleneck principle views a supervised learning task as an optimization problem squeezing the information that an input provides about the output through an information bottleneck. The information bottleneck is an optimal intermediate representation maximally compressing the mutual information between the input and the representation while preserving as much as possible mutual information between the representation and the output. Recently,

Tishby & Zaslavsky (2015) and Shwartz-Ziv & Tishby (2017)

illustrated that the layered architectures of deep neural networks fit nicely to the information bottleneck principle and each layer of a deep neural network can work as an information bottleneck.

In this paper, we adopt the information bottleneck principle as a criterion for finding a brief but comprehensive explanation. We call the resulting method VIBI (variational information bottleneck for interpretation). Using this principle, we learn an explainer that favors brief explanations while enforcing that the explanations alone suffice for accurate approximations to a black-box. For each instance, the explainer returns a probability whether a chunk of features such as a word, phrase, sentence or a group of pixels will be selected as an explanation or not. The selected chunks act as an information bottleneck that is maximally compressed about an input and informative about a decision made by a black-box system on that input. Hence they provide a brief but comprehensive explanation about a black-box decision.

1.1 Contribution

Our main contribution is to provide a new framework that systemically defines and generates a ‘good’ explanation (brief but comprehensive) using the information bottleneck principle. To our best knowledge, this is the first study that adopts the information bottleneck principle for explaining a black-box model. Based on this principle, we develop VIBI objective for learning an explainer favors a brief but comprehensive explanation. In order to make VIBI objective tractable, we derive a variational approximation to the objective and use a continuous reparameterization of the sampling distribution.

Compared to existing interpretable machine learning methods, the benefits are as follow: System-agnostic: VIBI can be applied to explain any black-box system. Post-hoc learning: VIBI learns a model in a post-hoc manner, hence no trade-off between the task accuracy and the interpretability. Cognitive chunk: VIBI groups non-cognitive raw features such as a pixel and letter into a cognitive chunk (e.g. a group of pixels, a word, a phrase, and a sentence) and uses it as an unit to be explained. Separated explainer and approximator: The explainer and approximator are designed for separated tasksso that we do not need to limit the approximator to have a simple structure, which may reduce the fidelity of approximator.

2 Related Work

We introduce recent interpretable machine learning methods providing a local interpretation that aims to find out reasons why a system makes a specific decision at a very local point of interest. Existing methods can be categorized based on whether it is designed to explain specific types (system-specific) or any types of black-box (system-agnostic).

System-specific methods. To measure a change of output with respect to changes of input is an intuitive and natural analog of getting feature attribution for the output. Baehrens et al. (2010); Simonyan et al. (2013); Smilkov et al. (2017) are same in that a change of output is calculated by propagating contributions through layers of a deep neural network towards an input, whereas the methods differ in the propagation logic used. However, these approaches fail to detect the changes of output when the prediction function is flatten at the instance (Shrikumar et al., 2017) which leads to interpretations focus on irrelevant features. In order to solve the problem, the layer-wise relevance propagation (LRP, Bach et al. (2015); Binder et al. (2016)), DeepLIFT (Shrikumar et al., 2017), and Integrated Gradients (Sundararajan et al., 2017) compare the changes of output to its reference output.

Unlike the methods above, the following methods form and explain a human understandable cognitive chunk with a group of raw features instead of a single raw feature. Murdoch et al. (2018); Singh et al. (2018) provides an attribution of a cognitive chunk such as a word, phrase, sentence or a group of pixels using the layer-wise contextual decomposition of a deep neural network. Yang et al. (2016); Mullenbach et al. (2018) also provide the attribution to a cognitive chunk by embedding a label-wise attention mechanism over each word and sentence. Lei et al. (2016) jointly learns a task-performing encoder and a generator specifying a distribution over cognitive chunks such as a word, phrase, and sentence to minimize a prediction loss.

System-agnostic methods. The great advantage of system-agnostic interpretable machine learning methods over system-specific methods is that their usage is not restricted to a specific black-box system. One of the most well-known methods is LIME (Ribeiro et al., 2016)

. It explains each instance by approximating the black-box decision boundary with an inherently interpretable model such as sparse linear or decision trees locally around the instance. LIME is different from other additive feature attribution methods such as Saliency Map, DeepLIFT, and LRP because it does not require any specific deep network architectures or learning algorithm for a black-box system.

Lundberg & Lee (2017) proposed SHAP values as a unified measure defined over the additive feature attribution scores in order for achieving local accuracy, missingness, and consistency. L2X (Chen et al., 2018) learns a stochastic map that selects instance-wise features that are most informative for black-box decisions. Unlike LIME and SHAP, which approximate local behaviors of a black-box system with a simple (linear) proxy, L2X does not put a limit on the structure of the approximator hence it avoids losing fidelity of the proxy to the black-box system.

Among the system-agnostic methods, VIBI is similar to L2X (Chen et al., 2018)

in that both learn a stochastic explainer that returns a distribution over the subset of features given the input and performs instance-wise feature selection based on that. However, the L2X explainer favors comprehensiveness of the explanation and does not account for briefness, while our explainer favors both briefness and comprehensiveness thus adding a new dimension to the stochastic explainer. In fact, L2X is a special case of VIBI having

in the information bottleneck objective (2).

3 Method

Figure 1: Illustration of VIBI. (A) VIBI is composed of two parts: the explainer and approximator, each of which is modeled by a deep neural network. The explainer selects a group of key cognitive chunks given an instance while the approximator mimics the behaviour of the black-box system using the selected keys as the input. (B) For each instance, the explainer returns a stochastic

-hot random vector

, which indicates whether a cognitive chunk will be selected as an explanation or not ( in this illustration. The instance-specific explanation is defined as .

3.1 Information bottleneck principle

The information bottleneck principle (Tishby et al., 2000) provides an appealing information theoretic view for learning a supervised model by defining what we mean by a good representation. In the view, the optimal model squeezes as much information as possible from the input to the output through a compressed representation (called information bottleneck). The representation is stochastically defined and the optimal stochastic mapping

is obtained by optimizing the following problem with Markov chain assumption

:

(1)

where is a mutual information and is a Lagrange multiplier representing a trade-off between the compressiveness (i.e. ), and informativeness (i.e. ) of the representation . This information bottleneck trade-off favors the most compressive representation while conveying sufficient information about the output, which works as a criterion for a ‘good’ representation for the information bottleneck model.

3.2 Perspective from information bottleneck principle

We adopt the information bottleneck principle as a criterion for finding brief but comprehensive explanations: the explanation should maximally compress the mutual information regarding the input while preserving as much as possible mutual information regarding the output. We formulate an information bottleneck objective for explaining by introducing a stochastic boolean encoding . It indicates instance-specific key cognitive chunks that serve as brief but comprehensive explanations for a black-box decision. Our goal is to learn an explainer which generates the encoding given the input . In order to achieve this, we formulate the following optimization problem using the following information bottleneck objective for explaining a black-box:

(2)

where is a Lagrange multiplier representing a trade-off between the briefness of the explanation (i.e. ), and sufficiency of information retained for explaining the black-box output (i.e. ).

The primary difference between our information bottleneck objective 2 and Tishby et al. (2000) is that VIBI aims to identify a stochastic map of the representation that itself works as an information bottleneck whereas we aim to identify a stochastic map of performing instance-wise selection of cognitive chunks and define information bottleneck as a function of and the input .

3.3 Proposed approach

As illustrated in Figure 1A, the information bottleneck model for explaining the black-box is composed of two parts: the explainer and approximator. The explainer selects a group of key cognitive chunks given an instance while the approximator mimics the behaviour of the black-box system using the selected keys as the input.

In detail, the explainer is modeled by a deep neural network that maps an input to attribution scores (where is for the -th cognitive chunk). The attribution score indicates the probability that each cognitive chunk to be selected. In order to select top cognitive chunk as an explanation, a -hot vector is sampled from a categorical distribution with class probabilities and the -th cognitive chunk is selected if . In more mathematical terms, the explanation is defined as follow:

where indicates a cognitive chunk, each of which corresponds to multiple row features . The approximator is modeled by another deep neural network and outputs to mimic the black-box decision made for the instance . In both cases, and represent the weights or parameters that define each corresponding neural network.

The explainer and approximator are trained jointly to minimize a cost function that favors short, concise explanations while enforcing that the explanations alone suffice for accurate prediction. In more technical terms, the selected cognitive chunks are maximally compressed about an input and informative about a decision made by a black-box on that input.

3.3.1 The variational bound

The mutual informations and are computationally expensive (Tishby et al., 2000; Chechik et al., 2005). In order to reduce the computational burden, we use a variational approximation to our information bottleneck objective (2). Our variational approximation is similar to the work in Alemi et al. (2017), which first developed variational lower bound on the information bottleneck objective (1) for deep neural networks. However, they apply the variational technique to the information bottleneck objective (1) that the stochastic encoding of the input is an information bottleneck as itself, whereas we apply the variational technique to the information bottleneck objective (2) that the information bottleneck is a pairwise product of the stochastic encoding and the input where is a boolean random vector performing instance-wise selection of cognitive chunks. We take a few steps further showing in order for approximating using the lower bound of described in the following paragraph. Now, we will examine each of the expressions in the information bottleneck objective in turn.

Variational bound for : For the variational approximation to , we first show that and use the lower bound for as a lower bound for .

First, using the chain rule for mutual information we obtain:

Now, we will show by showing that in our probablistic framework.

Note that when otherwise 0 because is deterministic when and are given. That is,

(3)

Since the conditional mutual information is always non-negative, we conclude that . Therefore, we get . Now, we use to approximate

. Using the fact that Kullback Leibler divergence is always positive, we have:

(4)

With (3) and (3.3.1), we have:

Variational bound for . Starting with , we have:

Note that the entropy can be ignored because it is independent of the optimization procedure.

Now, we use to approximate which works as an approximator to the black-box system. Using the fact that Kullback Leibler divergence is always positive, we have:

It gives the following lower bound on :

where by the Markov chain assumption .

In summary, we have the following variational bounds for each term.

which result in:

(5)

With proper choices of and , we assume that the Kullback-Leibler divergence is integrated analytically. We use the empirical data distribution to approximate and .

3.3.2 Continuous relaxation and reparameterization

Note that we aim to sample top out of cognitive chunks where each chunk is assumed drawn from a categorical distribution with class probabilities representing the attribution of the -th chunk to the black-box decision. This imposes a computational burden of summing over the combinations of feature subsets on evaluating the objective function (3.3.1). In order to avoid this, we use the generalized Gumbel-softmax trick (Jang et al., 2017; Chen et al., 2018)

, which approximates the non-differentiable categorical subset sampling with Gumbel-softmax samples that are differentiable. This trick allows using standard backpropagation to compute the gradients of the parameters via reparameterization.

First, we independently sample a cognitive chunk for times. For each time, a random perturbation is added to the log probability of each cognitive chunk . Then, a concrete random vector working as a continuous, differentialble approximation to argmax is defined as follows:

where is a tuning parameter for the temperature of Gumbel-Softmax distribution. Next, we define a continuous-relaxed random vector as the elementwise maximum of the independently sampled Concrete vectors where :

With this sampling scheme, we approximate the -hot random vector and have the continuous approximation to the variational bound . By putting everything together, we have:

where is the approximator to the black-box system and represents the compactness of the explanation. Once we learn the model, the attribution score for each cognitive chunk is used to select top key cognitive chunks that are maximally compressive about the input and informative about the black-box decision on that input. Therefore, the selected cognitive chunks serves as brief but comprehensive explanations for the black-box decision.

4 Experiments

We applied VIBI to explain deep learning black-box models using text and image datasets: LSTM movie sentiment prediction model using the IMDB text dataset and CNN digit recognition model using the MNIST image dataset. We evaluated VIBI from two perspectives: interpretability and fidelity. The interpretability implies the ability to explain a black-box system with human understandable terms. The fidelity implies how accurately our approximator mimics a black-box. Based on these criteria, we compared VIBI to two strong existing system-agnostic methods (LIME (Ribeiro et al., 2016) and L2X (Chen et al., 2018)), and a commonly used system-specific method called Saliency Map. For Saliency Map, we used the Smooth Gradient technique (Smilkov et al., 2017) for getting visually sharpen gradient-based sensitivity maps over the basic gradient saliency map (Smilkov et al., 2017).

We examined how VIBI performs across different experimental settings varying the number of selected chunks (amount or number of explanation), size of chunk (unit of explanation), and trade-off parameter (trade-off between the compressiveness of explanation and information preserved about the output). We optimize the models with the following search space (bold indicate the choice for our final model): the temperature for Gumbel-softmax approximation , learning rate – and . We use Adam algorithm (Kingma & Ba, 2014) with batch size 100 for MNIST and 50 for IMDB, the coefficients used for computing running averages of gradient and its square , and epsilon . All implementation is performed via PyTorch an open source deep learning platform (Paszke et al., 2017). The code is publicly available on GitHub https://github.com/SeojinBang/VIBI.

4.1 Explanation for the LSTM movie sentiment prediction model using IMDB

Figure 2: The movie reviews and explanations provided by VIBI were randomly selected from the validation set. The selected words are colored red. Each word is used as a cognitive chunk and words are provided for each review.

The IMDB (Maas et al., 2011)

is a large text dataset containing movie reviews labeled by sentiment (positive/negative). We grouped the reviews into a training set of 25,000 reviews, a validation set of 12,500 reviews and a test set of 12,500 reviews. Then, we trained a hierarchical LSTM for sentiment prediction, which achieved 87% of test accuracy. Each review is padded or cut to contain 15 sentences and each sentence is to contain 50 words. The architecture is a word-embedding layer with size 50 for each word followed by two bidirectional LSTMs encoding the word-vector and sentence-vector respectively, a fully connected layer with two units, and a soft-max layer. The first LSTM layer encodes the word embedding vector to generate a word-representation vector with size 100 for each. Within each sentence, the word representation vectors are elementwisely averaged to form a size 100 sentence representation vector. The second LSTM layer encodes the sentence representation vector to generate a size 60 review embedding vector.

VIBI explains why the LSTM predicts each movie review to be positive/negative and provides instance-wise key words that are the most important attributes to the sentiment prediction. In order explain the LSTM black-box model, we applied VIBI. We parameterized the explainer and approximator using deep neural networks. For example, when a word is used as a cognitive chunk, we used a bidirectional LSTM for parameterizing the explainer. The output vector from each LSTM cell is averaged and followed by log-softmax calculation. Hence, the final layer is formed to return a log-probability indicating indicating which word should be taken as the input for the approximator. We parameterized the approximator using a convolutional layer followed by a ReLU activation function and max-pooling layer and a fully connected layer returning a size-2 vector followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the two sentiments (positive/negative).

As seen in the top-right and top-left of Figure 2, VIBI shows that the positive (or negative) words pass through the bottleneck and make a correct prediction. The bottom of Figure 2 shows that the LSTM sentiment prediction model makes a wrong prediction for a negative review because the review includes several positive words such as ‘enjoyable’ and ‘exciting’.

4.2 Explanation for the CNN digit recognition model using MNIST

Figure 3: The hand-written digits and explanations provided by VIBI were randomly selected from the validation set. The selected patches are colored red if the pixel is activated (i.e. white) and yellow otherwise (i.e. black). A patch composed of pixels is used as a cognitive chunk and patches are identified for each image.

The MNIST (LeCun et al., 1998) is a large dataset contains sized images of handwritten digits (0 to 9). We grouped the images into a training set of 50,000 images, a validation set of 10,000 images and a test set of 10,000 images and trained a simple 2D CNN for the digit recognition, which achieved 97% of test accuracy. The architecture is two convolutional layers with kernel size 5 followed by a max-pooling layer with pool size 2, two fully connected layers and a soft-max layer. The two convolutional layers contain 10 and 20 filters respectively and the two fully connected layers are composed of 50 and 10 units respectively.

VIBI explains how the CNN characterizes a digit and recognizes differences between digits. In order explain the CNN black-box model, we applied VIBI. We parameterized the explainer using 2D CNN. For example, when cognitive chunk is used, the structure is as follow: two convolutional layers with kernel size 5 followed by a ReLU activation function and max-pooling layer with pool size 2, and one convolutional layer with kernel size 1 returning a 2D matrix followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the 49 chunks. The three convolutional layers contains 8, 16, and 1 filters respectively. The output from the explainer is used as an explainer indicating which cognitive chunk should be taken as the input for the approximator. We parameterized the approximator using two convolutional layers with kernel size 5 followed by a ReLU activation function and max-pooling layer with pool size 2 and with 32 and 64 filters respectively, and one fully connected layer returning a size-10 vector followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the ten digits (0-9).

The first two examples in Figure 3 show that the CNN recognizes digits using both shapes and angles. In the first example, the CNN characterizes ‘1’s by straightly aligned patches along with the activated regions although ‘1’s in the left and right panels are written at different angles. Contrary to the first example, the second example shows that the CNN recognizes the difference between ‘9’ and ‘6’ by their differences in angles. The last two examples in Figure 3 shows that the CNN catches a difference of ‘7’s from ‘1’s by patches located on the activated horizontal line on ‘7’ (see the cyan circle) and recognizes ‘8’s by two patches on the top of the digits and another two patches at the bottom circle.

4.3 Interpretability evaluated by humans

We evaluated interpretability of the methods on the LSTM movie sentiment prediction model using IMDB. We assume that a better explanation allows human to better infer the black-box output given the explanation. Therefore, we asked humans to infer the output of the black-box system (Positive/Negative/Neutral) given five key words as an explanation generated by each method. Each method was evaluated by the human intelligences on Amazon Mechanical Turk111MTurk, https://www.mturk.com/ who are awarded the Masters Qualification (i.e. high-performance workers who have demonstrated excellence across a wide range of tasks). 200 instance for VIBI and 100 instances for the others were randomly selected and evaluated. Five workers are assigned per instance.

We also evaluated the interpretability for the CNN digit recognition model using MNIST. We asked humans to directly score the explanation on a 0 to 5 scale (0 for no explanation, 1-4 for insufficient or redundant explanation and 5 for concise explanation). Each method was evaluated 16 graduate students at School of Computer Science, Carnegie Mellon University who have taken at least one graduate-level machine learning class. For each method, 100 instances were randomly selected and evaluated. The cognitive chunks with the size are provided as an explanation for each instance ( for VIBI). On average, 4.26 students were assigned per instance. For further details about the experiments, please see Supplementary Material 6.

VIBI better explains the black-box models as shown by the Table 1. For explaining the LSTM movie sentiment prediction model using IMDB dataset, humans better infer the black-box output given the five keywords when they are provided by VIBI. Therefore, it better catches the most contributing key words to the LSTM decision and better explains why the LSTM predicted each movie review by providing five key words. For explaining the CNN digit recognition model using MNIST dataset, VIBI also highlights the most concise chunks for explaining key characteristics of the handwritten digit. Thus, it better explains how the CNN model recognized each the handwritten digit.

Saliency LIME L2X VIBI (Ours)
IMDB 34.2% 33.8% 35.6% 44.7%
MNIST 3.448 1.369 1.936 3.526
  • For IMDB, the percentage indicates how well the MTurk worker’s answers match the black box output. For MNIST, the score indicates how well the highlighted chunks catch key characteristics of the handwritten digits. The average scores over all samples is shown on a 0 to 5 scale. See the survey example and detailed result in Supplementary Material Table 4 and 5 for the detailed result.

Table 1: Evaluation of interpretability. VIBI better explains the black-box models: VIBI provides the most contributing key words for explaining the LSTM movie sentiment prediction model using IMDB dataset and provides the most concise cognitive chunks of pixels for explaining the the CNN digit recognition model using MNIST dataset.

4.4 Evaluation of the fidelity

We assessed the fidelity by prediction performance of the approximator with respect to the black-box output. We introduce two types of formalized metrics to quantitatively evaluate the fidelity: Approximator fidelity and Rationale fidelity. Approximator fidelity implies ability of the approximator to imitate the behaviour of a black-box and rationale fidelity implies how much the selected chunks contribute to the approximator fidelity. In detail, approximator fidelity is quantified by prediction performance of the approximator which take , the continuous relaxation of , as an input and the black-box output as a targeted label. Rationale fidelity is quantified by prediction performance of the approximator which takes as an input and the black-box output as a targeted label. (Note that only takes raw features corresponding to the top selected cognitive chunks and the others are set to zero.)

As shown by the Table 2, VIBI outperforms Saliency and LIME in most cases whereas performs similarly with L2X in approximator fidelity. However, it does not mean both approximators achieved the same fidelity. As shown by Table 3, the selected chunks of VIBI account for more approximator fidelity than L2X. Recall that L2X is a special case of VIBI having the information bottleneck trade-off parameter (i.e. not using the compressiveness constraint ). Therefore, compressing information through the explainer achieves not only conciseness of explanation but also better fidelity of explanation to a black-box.

chunk size k Saliency LIME L2X VIBI (Ours)
IMDB sentence 1 0.387 0.727 0.876 0.877
word 5 0.419 0.756 0.738 0.744
5 words 1 0.424 0.290 0.759 0.764
5 words 3 0.414 0.679 0.833 0.835
MNIST 16 0.912 0.770 0.934 0.948
24 0.938 0.807 0.951 0.953
40 0.957 0.859 0.967 0.962
4 0.863 0.609 0.953 0.948
6 0.906 0.637 0.957 0.956
10 0.949 0.705 0.965 0.967
  • for VIBI. Accuracy is shown. See more evaluations using F1-score and further results from different parameter settings in Supplementary Material Table 7 and 9.

Table 2: Evaluation of approximator fidelity. VIBI and L2X outperform the others in approximating the black-box models. However, it does not mean both approximators are same in fidelity. Plese see Table 3 for further discussion.

chunk size k L2X VIBI (Ours)
IMDB sentence 1 0.727 0.731
word 5 0.638 0.657
5 words 1 0.601 0.632
5 words 3 0.694 0.660
MNIST 16 0.735 0.771
24 0.776 0.856
40 0.811 0.915
4 0.650 0.775
6 0.511 0.701
10 0.835 0.933
  • for VIBI. Accuracy is shown. See more evaluations using F1-score and further results from different parameter settings in Supplementary Material Table 6 and 8.

Table 3: Evaluation of the rationale fidelity. Compressing information achieves not only conciseness of explanation but also better fidelity of explanation to a black-box.

5 Conclusion

We employ the information bottleneck principle as a criterion for learning ‘good’ explanations. Instance-wisely selected cognitive chunks work as an information bottleneck, hence, provide concise while comprehensive explanations for each decision made by a black-box system.

References

Supplementary Materials

6 Interpretability evaluated by humans

6.1 LSTM movie sentiment prediction model using IMDB dataset

Black-Box Output
Recognized
by Mturk worker
Saliency LIME L2X VIBI (Ours)
Positive Positive 19.8 17.4 17.6 24.3
Positive Negative 12.6 6.8 7.2 16.9
Positive Neutral 25.6 18.8 24.2 11.1
Negative Positive 11.0 10.6 11.6 16.2
Negative Negative 14.4 16.4 18.0 20.4
Negative Neutral 16.6 30.0 21.4 11.2
  • The interpretable machine learning methods were evaluated by workers at Amazon Mechanical Turk (https://www.mturk.com/) who are awarded the Masters Qualification (i.e. high performance workers who have demonstrated excellence across a wide range of task). Randomly selected instances (200 for VIBI and 100 for the others) were evaluated for each method. 5 workers are assigned per instance. (See the survey example below for further details.)

Table 4: Evaluation of Interpretability on an LSTM movie sentiment prediction model using IMDB. For each method, the percentage of samples belongs to each combination of the black-box output and the sentiment recognized by the workers are showed. VIBI has the highest percentage of samples belonging to the Positive/Positive or Negative/Negataive and the lowest percentage of samples belonging to the Positive/Neutral or Negative/Neutral. Although LIME has the lowest percentage of samples belonging to the Positive/Negative or Negative/Positive, it is because LIME tends to select words such as ‘that’, ‘the’, ‘is’ so that most of samples are recognized as Neutral.

Survey example for IMDB

Title: Label sentiment given a few words.
Description: Recognize the primary sentiment of the movie review given a few words only.

Figure 4: A survey example of MTurk evaluation for intepretability on an LSTM movie sentiment prediction model

6.2 CNN digit recognition model using MNIST

MNIST Digit Saliency LIME L2X VIBI (Ours)
0 3.200 2.000 2.333 3.000
1 4.393 0.795 1.263 3.913
2 3.125 1.200 1.400 3.200
3 3.286 1.833 2.429 3.625
4 3.333 1.000 1.857 3.857
5 3.167 1.381 2.000 2.875
6 3.333 1.000 1.889 3.625
7 3.667 2.000 1.667 4.000
8 3.750 1.333 2.667 3.500
9 3.222 1.143 1.857 3.667
Ave. over digits 3.448 1.369 1.936 3.526
  • The interpretable machine learning methods were evaluated by 16 graduate students at School of Computer Science, Carnegie Mellon University who have taken at least one graduate-level machine learning class. Randomly selected 100 instances were evaluated for each method. On average, 4.26 students are assigned per instance. (See the survey example below for further details.)

Table 5: Evaluation of Interpretability on a 2d CNN digit recognition model using MNIST. The average score per digit is showed on a 0 to 5 scale. VIBI outperforms L2X and LIME and slightly outperforms Saliency in terms of the average score over digits. VIBI outperforms at digit 2, 3, 4, 6, 7, and 9, and performs comparable to Saliency at 0, 1, 5, 8.

Survey example for MNIST

MNIST is a large dataset contains 28 x 28 sized images of handwritten digits (0 to 9). Here, a 2D convolutional neural network (CNN) is used for the digit recognition for MNIST. Several interpretable machine learning methods are learned to explain the model by highlighting key pixels that play an important role in the CNN digit recognition. The highlighted pixels provides an explanation for a handwritten image why the CNN model recognized the handwriting as it does. Your task is to evaluate the explanation for each instance on a scale 0 to 5. Please score each instance based on following criteria.

  • 0 - No explanation: the pixels are randomly highlighted

  • 1 to 4 - Insufficient or redundant explanation: there are some redundant pixels highlighted or only some of key characteristics of each digit are highlighted

  • 5 - Concise explanation: the highlighted pixel concisely catches key characteristic of digit



7 Fidelity


chunk
size
K L2X VIBI (Ours)
0 0.001 0.01 0.1 1 10 100
Accuracy sentence 1 0.727 0.693 0.711 0.731 0.729 0.734 0.734
word 5 0.638 0.657 0.666 0.657 0.648 0.640 0.654
5 words 1 0.601 0.630 0.624 0.632 0.628 0.623 0.628
5 words 3 0.694 0.660 0.662 0.660 0.662 0.660 0.660
F1-score sentence 1 0.581 0.547 0.562 0.567 0.585 0.586 0.586
word 5 0.486 0.521 0.551 0.512 0.516 0.508 0.526
5 words 1 0.478 0.506 0.500 0.540 0.501 0.498 0.504
5 words 3 0.551 0.528 0.529 0.522 0.529 0.528 0.525
  • Rationale fidelity quantifies ability of the selected chunks to infer the black-box output. A large rationale fidelity implies that the selected chunks account for a large portion of the approximator fidelity. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()

Table 6: Evaluation of rationale fidelity on LSTM movie sentiment prediction model using IMDB.
chunk
size
K Saliency LIME L2X VIBI (Ours)
0 0.001 0.01 0.1 1 10 100
Accuracy sentence 1 0.387 0.727 0.876 0.877 0.869 0.877 0.879 0.879 0.884
word 5 0.419 0.756 0.738 0.766 0.772 0.744 0.773 0.763 0.767
5 words 1 0.424 0.290 0.759 0.784 0.780 0.764 0.774 0.778 0.774
5 words 3 0.414 0.679 0.833 0.836 0.831 0.835 0.834 0.830 0.833
F1-score sentence 1 0.331 0.564 0.721 0.693 0.707 0.730 0.730 0.727 0.734
word 5 0.350 0.585 0.565 0.607 0.616 0.594 0.620 0.609 0.612
5 words 1 0.360 0.302 0.621 0.641 0.622 0.624 0.615 0.622 0.616
5 words 3 0.352 0.523 0.680 0.683 0.674 0.681 0.677 0.669 0.682
  • Approximator fidelity quantifies ability of the approximator to imitate the behaviour of a black-box. Prediction accuracy and F1-score of the approximator for the LSTM model are shown. ()

Table 7: Evaluation of approximator fidelity on LSTM movie sentiment prediction model using IMDB.
Measure
chunk
size
K L2X VIBI (Ours)
0 0.001 0.01 0.1 1 10 100
Accuracy 64 0.694 0.690 0.726 0.689 0.742 0.729 0.766
96 0.814 0.831 0.780 0.806 0.859 0.765 0.826
160 0.903 0.907 0.905 0.917 0.917 0.928 0.902
16 0.735 0.795 0.750 0.771 0.732 0.753 0.769
24 0.776 0.855 0.834 0.856 0.868 0.854 0.847
40 0.811 0.914 0.914 0.915 0.903 0.918 0.935
80 0.905 0.949 0.940 0.939 0.962 0.941 0.923
4 0.650 0.655 0.650 0.775 0.717 0.682 0.681
6 0.511 0.858 0.706 0.701 0.708 0.690 0.730
10 0.835 0.835 0.824 0.933 0.875 0.854 0.782
20 0.954 0.962 0.815 0.934 0.929 0.946 0.943
F1-score 64 0.684 0.679 0.716 0.670 0.734 0.710 0.755
96 0.808 0.825 0.750 0.803 0.854 0.750 0.820
160 0.898 0.902 0.899 0.912 0.913 0.924 0.897
16 0.720 0.786 0.738 0.761 0.723 0.744 0.769
24 0.766 0.848 0.836 0.851 0.858 0.859 0.840
40 0.798 0.914 0.910 0.910 0.898 0.914 0.931
80 0.901 0.946 0.936 0.930 0.959 0.938 0.918
4 0.634 0.658 0.637 0.763 0.704 0.671 0.669
6 0.493 0.852 0.693 0.687 0.692 0.675 0.720
10 0.828 0.827 0.816 0.928 0.869 0.849 0.773
20 0.950 0.959 0.806 0.931 0.926 0.942 0.940
  • Rationale fidelity quantifies ability of the selected chunks to infer the black-box output. A large rationale fidelity implies that the selected chunks account for a large portion of the approximator fidelity. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()

Table 8: Evaluation of the rationale fidelity on CNN digit recognition model using MNIST.
chunk
size
K Saliency LIME L2X VIBI (Ours)
0 0.001 0.01 0.1 1 10 100
Accuracy 64 0.944 0.982 0.933 0.959 0.962 0.959 0.960 0.952 0.953
96 0.956 0.986 0.963 0.963 0.951 0.967 0.968 0.953 0.962
160 0.964 0.989 0.970 0.967 0.973 0.974 0.974 0.974 0.967
16 0.912 0.770 0.934 0.945 0.941 0.948 0.938 0.939 0.940
24 0.938 0.807 0.951 0.956 0.955 0.953 0.953 0.953 0.960
40 0.957 0.859 0.967 0.965 0.966 0.962 0.967 0.965 0.967
80 0.966 0.897 0.976 0.977 0.974 0.972 0.977 0.971 0.973
4 0.863 0.609 0.953 0.922 0.928 0.948 0.942 0.942 0.953
6 0.906 0.637 0.957 0.963 0.954 0.956 0.953 0.963 0.962
10 0.949 0.705 0.965 0.971 0.959 0.967 0.961 0.969 0.964
20 0.963 0.771 0.974 0.977 0.975 0.975 0.973 0.975 0.974
F1-score 64 0.938 0.981 0.930 0.956 0.960 0.956 0.957 0.950 0.950
96 0.950 0.985 0.961 0.961 0.954 0.965 0.966 0.951 0.960
160 0.959 0.989 0.969 0.965 0.971 0.973 0.972 0.972 0.966
16 0.902 0.755 0.930 0.942 0.936 0.944 0.934 0.936 0.936
24 0.932 0.795 0.949 0.954 0.952 0.950 0.951 0.950 0.958
40 0.952 0.853 0.965 0.963 0.964 0.961 0.965 0.963 0.965
80 0.962 0.892 0.974 0.976 0.973 0.972 0.975 0.969 0.971
4 0.849 0.588 0.951 0.917 0.923 0.944 0.938 0.939 0.950
6 0.895 0.621 0.954 0.961 0.951 0.953 0.950 0.960 0.959
10 0.943 0.689 0.963 0.969 0.956 0.965 0.958 0.968 0.961
20 0.959 0.763 0.972 0.976 0.972 0.974 0.972 0.973 0.972
  • Approximator fidelity quantifies ability of the approximator to imitate the behaviour of a black-box. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()

Table 9: Evaluation of the approximator fidelity on CNN digit recognition model using MNIST.