Interpretability is crucial in implementing and utilizing black-box decision systems such as deep learning models. Interpretation of a black-box system provides justification to accept or reject decisions suggested by the system or explains the logic behind the system. In recent years, the extensive use of deep learning black-box systems particularly in a critical area such as self-driving cars and medical decision-making have given rise to interpretable machine learning approaches that are proposed to explain black-box systems. In order to effectively justify a black-box decision, both briefness and comprehensiveness must be taken into account for providing sufficient information while avoiding redundancy. However, existing approaches lack in-depth consideration for and fail to find both brief but comprehensive explanation. In this paper, we quantify this idea from an information theoretic perspective: the explanation should be maximally informative about a black-box decision while compressive about a given input. Thus we propose a systemic framework for providing a brief but comprehensive explanation by adopting the inspiring information theoretic principle,information bottleneck principle (Tishby et al., 2000). We learn a stochastic boolean encoding that indicates instance-specific key features that serve as brief but comprehensive explanations for each black-box decision using a variational approximation to the information bottleneck objective.
The information bottleneck principle views a supervised learning task as an optimization problem squeezing the information that an input provides about the output through an information bottleneck. The information bottleneck is an optimal intermediate representation maximally compressing the mutual information between the input and the representation while preserving as much as possible mutual information between the representation and the output. Recently,Tishby & Zaslavsky (2015) and Shwartz-Ziv & Tishby (2017)
illustrated that the layered architectures of deep neural networks fit nicely to the information bottleneck principle and each layer of a deep neural network can work as an information bottleneck.
In this paper, we adopt the information bottleneck principle as a criterion for finding a brief but comprehensive explanation. We call the resulting method VIBI (variational information bottleneck for interpretation). Using this principle, we learn an explainer that favors brief explanations while enforcing that the explanations alone suffice for accurate approximations to a black-box. For each instance, the explainer returns a probability whether a chunk of features such as a word, phrase, sentence or a group of pixels will be selected as an explanation or not. The selected chunks act as an information bottleneck that is maximally compressed about an input and informative about a decision made by a black-box system on that input. Hence they provide a brief but comprehensive explanation about a black-box decision.
Our main contribution is to provide a new framework that systemically defines and generates a ‘good’ explanation (brief but comprehensive) using the information bottleneck principle. To our best knowledge, this is the first study that adopts the information bottleneck principle for explaining a black-box model. Based on this principle, we develop VIBI objective for learning an explainer favors a brief but comprehensive explanation. In order to make VIBI objective tractable, we derive a variational approximation to the objective and use a continuous reparameterization of the sampling distribution.
Compared to existing interpretable machine learning methods, the benefits are as follow: System-agnostic: VIBI can be applied to explain any black-box system. Post-hoc learning: VIBI learns a model in a post-hoc manner, hence no trade-off between the task accuracy and the interpretability. Cognitive chunk: VIBI groups non-cognitive raw features such as a pixel and letter into a cognitive chunk (e.g. a group of pixels, a word, a phrase, and a sentence) and uses it as an unit to be explained. Separated explainer and approximator: The explainer and approximator are designed for separated tasksso that we do not need to limit the approximator to have a simple structure, which may reduce the fidelity of approximator.
2 Related Work
We introduce recent interpretable machine learning methods providing a local interpretation that aims to find out reasons why a system makes a specific decision at a very local point of interest. Existing methods can be categorized based on whether it is designed to explain specific types (system-specific) or any types of black-box (system-agnostic).
System-specific methods. To measure a change of output with respect to changes of input is an intuitive and natural analog of getting feature attribution for the output. Baehrens et al. (2010); Simonyan et al. (2013); Smilkov et al. (2017) are same in that a change of output is calculated by propagating contributions through layers of a deep neural network towards an input, whereas the methods differ in the propagation logic used. However, these approaches fail to detect the changes of output when the prediction function is flatten at the instance (Shrikumar et al., 2017) which leads to interpretations focus on irrelevant features. In order to solve the problem, the layer-wise relevance propagation (LRP, Bach et al. (2015); Binder et al. (2016)), DeepLIFT (Shrikumar et al., 2017), and Integrated Gradients (Sundararajan et al., 2017) compare the changes of output to its reference output.
Unlike the methods above, the following methods form and explain a human understandable cognitive chunk with a group of raw features instead of a single raw feature. Murdoch et al. (2018); Singh et al. (2018) provides an attribution of a cognitive chunk such as a word, phrase, sentence or a group of pixels using the layer-wise contextual decomposition of a deep neural network. Yang et al. (2016); Mullenbach et al. (2018) also provide the attribution to a cognitive chunk by embedding a label-wise attention mechanism over each word and sentence. Lei et al. (2016) jointly learns a task-performing encoder and a generator specifying a distribution over cognitive chunks such as a word, phrase, and sentence to minimize a prediction loss.
System-agnostic methods. The great advantage of system-agnostic interpretable machine learning methods over system-specific methods is that their usage is not restricted to a specific black-box system. One of the most well-known methods is LIME (Ribeiro et al., 2016)
. It explains each instance by approximating the black-box decision boundary with an inherently interpretable model such as sparse linear or decision trees locally around the instance. LIME is different from other additive feature attribution methods such as Saliency Map, DeepLIFT, and LRP because it does not require any specific deep network architectures or learning algorithm for a black-box system.Lundberg & Lee (2017) proposed SHAP values as a unified measure defined over the additive feature attribution scores in order for achieving local accuracy, missingness, and consistency. L2X (Chen et al., 2018) learns a stochastic map that selects instance-wise features that are most informative for black-box decisions. Unlike LIME and SHAP, which approximate local behaviors of a black-box system with a simple (linear) proxy, L2X does not put a limit on the structure of the approximator hence it avoids losing fidelity of the proxy to the black-box system.
Among the system-agnostic methods, VIBI is similar to L2X (Chen et al., 2018)
in that both learn a stochastic explainer that returns a distribution over the subset of features given the input and performs instance-wise feature selection based on that. However, the L2X explainer favors comprehensiveness of the explanation and does not account for briefness, while our explainer favors both briefness and comprehensiveness thus adding a new dimension to the stochastic explainer. In fact, L2X is a special case of VIBI havingin the information bottleneck objective (2).
3.1 Information bottleneck principle
The information bottleneck principle (Tishby et al., 2000) provides an appealing information theoretic view for learning a supervised model by defining what we mean by a good representation. In the view, the optimal model squeezes as much information as possible from the input to the output through a compressed representation (called information bottleneck). The representation is stochastically defined and the optimal stochastic mapping
is obtained by optimizing the following problem with Markov chain assumption:
where is a mutual information and is a Lagrange multiplier representing a trade-off between the compressiveness (i.e. ), and informativeness (i.e. ) of the representation . This information bottleneck trade-off favors the most compressive representation while conveying sufficient information about the output, which works as a criterion for a ‘good’ representation for the information bottleneck model.
3.2 Perspective from information bottleneck principle
We adopt the information bottleneck principle as a criterion for finding brief but comprehensive explanations: the explanation should maximally compress the mutual information regarding the input while preserving as much as possible mutual information regarding the output. We formulate an information bottleneck objective for explaining by introducing a stochastic boolean encoding . It indicates instance-specific key cognitive chunks that serve as brief but comprehensive explanations for a black-box decision. Our goal is to learn an explainer which generates the encoding given the input . In order to achieve this, we formulate the following optimization problem using the following information bottleneck objective for explaining a black-box:
where is a Lagrange multiplier representing a trade-off between the briefness of the explanation (i.e. ), and sufficiency of information retained for explaining the black-box output (i.e. ).
The primary difference between our information bottleneck objective 2 and Tishby et al. (2000) is that VIBI aims to identify a stochastic map of the representation that itself works as an information bottleneck whereas we aim to identify a stochastic map of performing instance-wise selection of cognitive chunks and define information bottleneck as a function of and the input .
3.3 Proposed approach
As illustrated in Figure 1A, the information bottleneck model for explaining the black-box is composed of two parts: the explainer and approximator. The explainer selects a group of key cognitive chunks given an instance while the approximator mimics the behaviour of the black-box system using the selected keys as the input.
In detail, the explainer is modeled by a deep neural network that maps an input to attribution scores (where is for the -th cognitive chunk). The attribution score indicates the probability that each cognitive chunk to be selected. In order to select top cognitive chunk as an explanation, a -hot vector is sampled from a categorical distribution with class probabilities and the -th cognitive chunk is selected if . In more mathematical terms, the explanation is defined as follow:
where indicates a cognitive chunk, each of which corresponds to multiple row features . The approximator is modeled by another deep neural network and outputs to mimic the black-box decision made for the instance . In both cases, and represent the weights or parameters that define each corresponding neural network.
The explainer and approximator are trained jointly to minimize a cost function that favors short, concise explanations while enforcing that the explanations alone suffice for accurate prediction. In more technical terms, the selected cognitive chunks are maximally compressed about an input and informative about a decision made by a black-box on that input.
3.3.1 The variational bound
The mutual informations and are computationally expensive (Tishby et al., 2000; Chechik et al., 2005). In order to reduce the computational burden, we use a variational approximation to our information bottleneck objective (2). Our variational approximation is similar to the work in Alemi et al. (2017), which first developed variational lower bound on the information bottleneck objective (1) for deep neural networks. However, they apply the variational technique to the information bottleneck objective (1) that the stochastic encoding of the input is an information bottleneck as itself, whereas we apply the variational technique to the information bottleneck objective (2) that the information bottleneck is a pairwise product of the stochastic encoding and the input where is a boolean random vector performing instance-wise selection of cognitive chunks. We take a few steps further showing in order for approximating using the lower bound of described in the following paragraph. Now, we will examine each of the expressions in the information bottleneck objective in turn.
Variational bound for : For the variational approximation to , we first show that and use the lower bound for as a lower bound for .
First, using the chain rule for mutual information we obtain:
Now, we will show by showing that in our probablistic framework.
Note that when otherwise 0 because is deterministic when and are given. That is,
Since the conditional mutual information is always non-negative, we conclude that . Therefore, we get . Now, we use to approximate
. Using the fact that Kullback Leibler divergence is always positive, we have:
Variational bound for . Starting with , we have:
Note that the entropy can be ignored because it is independent of the optimization procedure.
Now, we use to approximate which works as an approximator to the black-box system. Using the fact that Kullback Leibler divergence is always positive, we have:
It gives the following lower bound on :
where by the Markov chain assumption .
In summary, we have the following variational bounds for each term.
which result in:
With proper choices of and , we assume that the Kullback-Leibler divergence is integrated analytically. We use the empirical data distribution to approximate and .
3.3.2 Continuous relaxation and reparameterization
Note that we aim to sample top out of cognitive chunks where each chunk is assumed drawn from a categorical distribution with class probabilities representing the attribution of the -th chunk to the black-box decision. This imposes a computational burden of summing over the combinations of feature subsets on evaluating the objective function (3.3.1). In order to avoid this, we use the generalized Gumbel-softmax trick (Jang et al., 2017; Chen et al., 2018)
, which approximates the non-differentiable categorical subset sampling with Gumbel-softmax samples that are differentiable. This trick allows using standard backpropagation to compute the gradients of the parameters via reparameterization.
First, we independently sample a cognitive chunk for times. For each time, a random perturbation is added to the log probability of each cognitive chunk . Then, a concrete random vector working as a continuous, differentialble approximation to argmax is defined as follows:
where is a tuning parameter for the temperature of Gumbel-Softmax distribution. Next, we define a continuous-relaxed random vector as the elementwise maximum of the independently sampled Concrete vectors where :
With this sampling scheme, we approximate the -hot random vector and have the continuous approximation to the variational bound . By putting everything together, we have:
where is the approximator to the black-box system and represents the compactness of the explanation. Once we learn the model, the attribution score for each cognitive chunk is used to select top key cognitive chunks that are maximally compressive about the input and informative about the black-box decision on that input. Therefore, the selected cognitive chunks serves as brief but comprehensive explanations for the black-box decision.
We applied VIBI to explain deep learning black-box models using text and image datasets: LSTM movie sentiment prediction model using the IMDB text dataset and CNN digit recognition model using the MNIST image dataset. We evaluated VIBI from two perspectives: interpretability and fidelity. The interpretability implies the ability to explain a black-box system with human understandable terms. The fidelity implies how accurately our approximator mimics a black-box. Based on these criteria, we compared VIBI to two strong existing system-agnostic methods (LIME (Ribeiro et al., 2016) and L2X (Chen et al., 2018)), and a commonly used system-specific method called Saliency Map. For Saliency Map, we used the Smooth Gradient technique (Smilkov et al., 2017) for getting visually sharpen gradient-based sensitivity maps over the basic gradient saliency map (Smilkov et al., 2017).
We examined how VIBI performs across different experimental settings varying the number of selected chunks (amount or number of explanation), size of chunk (unit of explanation), and trade-off parameter (trade-off between the compressiveness of explanation and information preserved about the output). We optimize the models with the following search space (bold indicate the choice for our final model): the temperature for Gumbel-softmax approximation – , learning rate – and – . We use Adam algorithm (Kingma & Ba, 2014) with batch size 100 for MNIST and 50 for IMDB, the coefficients used for computing running averages of gradient and its square , and epsilon . All implementation is performed via PyTorch an open source deep learning platform (Paszke et al., 2017). The code is publicly available on GitHub https://github.com/SeojinBang/VIBI.
4.1 Explanation for the LSTM movie sentiment prediction model using IMDB
The IMDB (Maas et al., 2011)
is a large text dataset containing movie reviews labeled by sentiment (positive/negative). We grouped the reviews into a training set of 25,000 reviews, a validation set of 12,500 reviews and a test set of 12,500 reviews. Then, we trained a hierarchical LSTM for sentiment prediction, which achieved 87% of test accuracy. Each review is padded or cut to contain 15 sentences and each sentence is to contain 50 words. The architecture is a word-embedding layer with size 50 for each word followed by two bidirectional LSTMs encoding the word-vector and sentence-vector respectively, a fully connected layer with two units, and a soft-max layer. The first LSTM layer encodes the word embedding vector to generate a word-representation vector with size 100 for each. Within each sentence, the word representation vectors are elementwisely averaged to form a size 100 sentence representation vector. The second LSTM layer encodes the sentence representation vector to generate a size 60 review embedding vector.
VIBI explains why the LSTM predicts each movie review to be positive/negative and provides instance-wise key words that are the most important attributes to the sentiment prediction. In order explain the LSTM black-box model, we applied VIBI. We parameterized the explainer and approximator using deep neural networks. For example, when a word is used as a cognitive chunk, we used a bidirectional LSTM for parameterizing the explainer. The output vector from each LSTM cell is averaged and followed by log-softmax calculation. Hence, the final layer is formed to return a log-probability indicating indicating which word should be taken as the input for the approximator. We parameterized the approximator using a convolutional layer followed by a ReLU activation function and max-pooling layer and a fully connected layer returning a size-2 vector followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the two sentiments (positive/negative).
As seen in the top-right and top-left of Figure 2, VIBI shows that the positive (or negative) words pass through the bottleneck and make a correct prediction. The bottom of Figure 2 shows that the LSTM sentiment prediction model makes a wrong prediction for a negative review because the review includes several positive words such as ‘enjoyable’ and ‘exciting’.
4.2 Explanation for the CNN digit recognition model using MNIST
The MNIST (LeCun et al., 1998) is a large dataset contains sized images of handwritten digits (0 to 9). We grouped the images into a training set of 50,000 images, a validation set of 10,000 images and a test set of 10,000 images and trained a simple 2D CNN for the digit recognition, which achieved 97% of test accuracy. The architecture is two convolutional layers with kernel size 5 followed by a max-pooling layer with pool size 2, two fully connected layers and a soft-max layer. The two convolutional layers contain 10 and 20 filters respectively and the two fully connected layers are composed of 50 and 10 units respectively.
VIBI explains how the CNN characterizes a digit and recognizes differences between digits. In order explain the CNN black-box model, we applied VIBI. We parameterized the explainer using 2D CNN. For example, when cognitive chunk is used, the structure is as follow: two convolutional layers with kernel size 5 followed by a ReLU activation function and max-pooling layer with pool size 2, and one convolutional layer with kernel size 1 returning a 2D matrix followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the 49 chunks. The three convolutional layers contains 8, 16, and 1 filters respectively. The output from the explainer is used as an explainer indicating which cognitive chunk should be taken as the input for the approximator. We parameterized the approximator using two convolutional layers with kernel size 5 followed by a ReLU activation function and max-pooling layer with pool size 2 and with 32 and 64 filters respectively, and one fully connected layer returning a size-10 vector followed by a log-softmax calculation so that the final layer returns a vector of log-probabilities for the ten digits (0-9).
The first two examples in Figure 3 show that the CNN recognizes digits using both shapes and angles. In the first example, the CNN characterizes ‘1’s by straightly aligned patches along with the activated regions although ‘1’s in the left and right panels are written at different angles. Contrary to the first example, the second example shows that the CNN recognizes the difference between ‘9’ and ‘6’ by their differences in angles. The last two examples in Figure 3 shows that the CNN catches a difference of ‘7’s from ‘1’s by patches located on the activated horizontal line on ‘7’ (see the cyan circle) and recognizes ‘8’s by two patches on the top of the digits and another two patches at the bottom circle.
4.3 Interpretability evaluated by humans
We evaluated interpretability of the methods on the LSTM movie sentiment prediction model using IMDB. We assume that a better explanation allows human to better infer the black-box output given the explanation. Therefore, we asked humans to infer the output of the black-box system (Positive/Negative/Neutral) given five key words as an explanation generated by each method. Each method was evaluated by the human intelligences on Amazon Mechanical Turk111MTurk, https://www.mturk.com/ who are awarded the Masters Qualification (i.e. high-performance workers who have demonstrated excellence across a wide range of tasks). 200 instance for VIBI and 100 instances for the others were randomly selected and evaluated. Five workers are assigned per instance.
We also evaluated the interpretability for the CNN digit recognition model using MNIST. We asked humans to directly score the explanation on a 0 to 5 scale (0 for no explanation, 1-4 for insufficient or redundant explanation and 5 for concise explanation). Each method was evaluated 16 graduate students at School of Computer Science, Carnegie Mellon University who have taken at least one graduate-level machine learning class. For each method, 100 instances were randomly selected and evaluated. The cognitive chunks with the size are provided as an explanation for each instance ( for VIBI). On average, 4.26 students were assigned per instance. For further details about the experiments, please see Supplementary Material 6.
VIBI better explains the black-box models as shown by the Table 1. For explaining the LSTM movie sentiment prediction model using IMDB dataset, humans better infer the black-box output given the five keywords when they are provided by VIBI. Therefore, it better catches the most contributing key words to the LSTM decision and better explains why the LSTM predicted each movie review by providing five key words. For explaining the CNN digit recognition model using MNIST dataset, VIBI also highlights the most concise chunks for explaining key characteristics of the handwritten digit. Thus, it better explains how the CNN model recognized each the handwritten digit.
For IMDB, the percentage indicates how well the MTurk worker’s answers match the black box output. For MNIST, the score indicates how well the highlighted chunks catch key characteristics of the handwritten digits. The average scores over all samples is shown on a 0 to 5 scale. See the survey example and detailed result in Supplementary Material Table 4 and 5 for the detailed result.
4.4 Evaluation of the fidelity
We assessed the fidelity by prediction performance of the approximator with respect to the black-box output. We introduce two types of formalized metrics to quantitatively evaluate the fidelity: Approximator fidelity and Rationale fidelity. Approximator fidelity implies ability of the approximator to imitate the behaviour of a black-box and rationale fidelity implies how much the selected chunks contribute to the approximator fidelity. In detail, approximator fidelity is quantified by prediction performance of the approximator which take , the continuous relaxation of , as an input and the black-box output as a targeted label. Rationale fidelity is quantified by prediction performance of the approximator which takes as an input and the black-box output as a targeted label. (Note that only takes raw features corresponding to the top selected cognitive chunks and the others are set to zero.)
As shown by the Table 2, VIBI outperforms Saliency and LIME in most cases whereas performs similarly with L2X in approximator fidelity. However, it does not mean both approximators achieved the same fidelity. As shown by Table 3, the selected chunks of VIBI account for more approximator fidelity than L2X. Recall that L2X is a special case of VIBI having the information bottleneck trade-off parameter (i.e. not using the compressiveness constraint ). Therefore, compressing information through the explainer achieves not only conciseness of explanation but also better fidelity of explanation to a black-box.
|chunk size||k||Saliency||LIME||L2X||VIBI (Ours)|
|chunk size||k||L2X||VIBI (Ours)|
We employ the information bottleneck principle as a criterion for learning ‘good’ explanations. Instance-wisely selected cognitive chunks work as an information bottleneck, hence, provide concise while comprehensive explanations for each decision made by a black-box system.
- Alemi et al. (2017) Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep variational information bottleneck. International Conference on Learning Representations, 2017.
Bach et al. (2015)
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7):e0130140, 2015.
- Baehrens et al. (2010) Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and MÃžller, K.-R. How to explain individual classification decisions. Journal of Machine Learning Research, 11(Jun):1803–1831, 2010.
- Binder et al. (2016) Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., and Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, pp. 63–71. Springer, 2016.
- Chechik et al. (2005) Chechik, G., Globerson, A., Tishby, N., and Weiss, Y. Information bottleneck for gaussian variables. Journal of machine learning research, 6(Jan):165–188, 2005.
- Chen et al. (2018) Chen, J., Song, L., Wainwright, M. J., and Jordan, M. I. Learning to explain: An information-theoretic perspective on model interpretation. International Conference on Machine Learning (ICML), 2018, 2018.
- Jang et al. (2017) Jang, E., Gu, S., and Poole, B. Categorical reparameterization with gumbel-softmax. International Conference on Learning Representations, 2017.
- Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Lei et al. (2016) Lei, T., Barzilay, R., and Jaakkola, T. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016.
- Lundberg & Lee (2017) Lundberg, S. M. and Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pp. 4765–4774, 2017.
Maas et al. (2011)
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C.
Learning word vectors for sentiment analysis.In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.
- Mullenbach et al. (2018) Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., and Eisenstein, J. Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695, 2018.
- Murdoch et al. (2018) Murdoch, W. J., Liu, P. J., and Yu, B. Beyond word importance: Contextual decomposition to extract interactions from lstms. arXiv preprint arXiv:1801.05453, 2018.
- Paszke et al. (2017) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. In NIPS-W, 2017.
- Ribeiro et al. (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. ACM, 2016.
- Shrikumar et al. (2017) Shrikumar, A., Greenside, P., and Kundaje, A. Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685, 2017.
- Shwartz-Ziv & Tishby (2017) Shwartz-Ziv, R. and Tishby, N. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
- Simonyan et al. (2013) Simonyan, K., Vedaldi, A., and Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
- Singh et al. (2018) Singh, C., Murdoch, W. J., and Yu, B. Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337, 2018.
- Smilkov et al. (2017) Smilkov, D., Thorat, N., Kim, B., Viégas, F., and Wattenberg, M. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Sundararajan et al. (2017) Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365, 2017.
- Tishby & Zaslavsky (2015) Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In Information Theory Workshop (ITW), 2015 IEEE, pp. 1–5. IEEE, 2015.
- Tishby et al. (2000) Tishby, N., Pereira, F. C., and Bialek, W. The information bottleneck method. arXiv preprint physics/0004057, 2000.
- Yang et al. (2016) Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489, 2016.
6 Interpretability evaluated by humans
6.1 LSTM movie sentiment prediction model using IMDB dataset
The interpretable machine learning methods were evaluated by workers at Amazon Mechanical Turk (https://www.mturk.com/) who are awarded the Masters Qualification (i.e. high performance workers who have demonstrated excellence across a wide range of task). Randomly selected instances (200 for VIBI and 100 for the others) were evaluated for each method. 5 workers are assigned per instance. (See the survey example below for further details.)
Survey example for IMDB
Title: Label sentiment given a few words.
Description: Recognize the primary sentiment of the movie review given a few words only.
6.2 CNN digit recognition model using MNIST
|MNIST Digit||Saliency||LIME||L2X||VIBI (Ours)|
|Ave. over digits||3.448||1.369||1.936||3.526|
The interpretable machine learning methods were evaluated by 16 graduate students at School of Computer Science, Carnegie Mellon University who have taken at least one graduate-level machine learning class. Randomly selected 100 instances were evaluated for each method. On average, 4.26 students are assigned per instance. (See the survey example below for further details.)
Survey example for MNIST
MNIST is a large dataset contains 28 x 28 sized images of handwritten digits (0 to 9). Here, a 2D convolutional neural network (CNN) is used for the digit recognition for MNIST. Several interpretable machine learning methods are learned to explain the model by highlighting key pixels that play an important role in the CNN digit recognition. The highlighted pixels provides an explanation for a handwritten image why the CNN model recognized the handwriting as it does. Your task is to evaluate the explanation for each instance on a scale 0 to 5. Please score each instance based on following criteria.
0 - No explanation: the pixels are randomly highlighted
1 to 4 - Insufficient or redundant explanation: there are some redundant pixels highlighted or only some of key characteristics of each digit are highlighted
5 - Concise explanation: the highlighted pixel concisely catches key characteristic of digit
Rationale fidelity quantifies ability of the selected chunks to infer the black-box output. A large rationale fidelity implies that the selected chunks account for a large portion of the approximator fidelity. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()
Approximator fidelity quantifies ability of the approximator to imitate the behaviour of a black-box. Prediction accuracy and F1-score of the approximator for the LSTM model are shown. ()
Rationale fidelity quantifies ability of the selected chunks to infer the black-box output. A large rationale fidelity implies that the selected chunks account for a large portion of the approximator fidelity. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()
Approximator fidelity quantifies ability of the approximator to imitate the behaviour of a black-box. Prediction accuracy and F1-score of the approximator for the CNN model are shown. ()