With the fast development of sophisticated machine learning algorithms, artificial intelligence has been gradually penetrating a number of of brand new fields with unprecedented speed. One of the outstanding problems hampering further progress is theinterpretability challenge. This challenge arises when the models build by the machine learning algorithms are to be used by humans in their decision making, particularly when such decisions are subject to legal consequences and/or administrative audits. For human decision makers operating in those circumstances, to accept the professional and legal responsibility ensuing from decisions assisted by machine learning, it is critical to comprehend the models. This is generally true for areas like criminal justice, health care, terrorism detection, education system and financial markets.
To trust the model, decision makers need to first understand the model’s behavior, and then evaluate and refine the model using their domain knowledge. Even for areas like book or movie recommendations  and automated aids , explanations for a recommendation and an error made could increase the trust and reliance on these systems. Furthermore, the European General Data Protection Regulation, forthcoming in June, 2018, stipulates the explainability of all automatically made decisions concerning individuals, and that includes the decisions made with or assisted by machine learning models. Hence, there is a growing demand for interpretability of the machine learning algorithms.
In this paper, we define interpretability of a model as the ability to provide visual or textual presentation of the connections between input features and the output predictions.
To realize the goal of interpretability, there are usually two approaches. One is to design an algorithm that is inherently interpretable, while achieving competitive accuracy of a complex model. The examples are Decision Trees, Decision Lists, and Decision Sets, etc. The disadvantage of this approach is that there is a trade off between interpretability and accuracy: it is not easy to learn an interpretable (so presumably simple) model expressing a complex process with very high accuracy. The other approach which does not sacrifice accuracy takes the opposite approach: it first builds a highly accurate model without worrying about interpretabilty, and subsequently uses a separate set of re-representation techniques to assist the user in understanding the behavior of the algorithm. One of the techniques could be to use the aforementioned relatively simple and interpretable algorithms to explain the behavior of a complex model and the reasons why a given classifier, treated as a black box, classifies a given instance in a particular way, e.g. LIME , BETA , TREPAN.
Deep learning methods have been lately very successful in image processing and natural language processing. It could be categorized as a representation learning approach, which learns refined features that could improve a model’s generalization ability. Deep learning, however, is highly non-interpretable.
In this paper we are reporting a work in progress where we try to interpret the inner mechanisms of deep learning. Our method: CNN-INTE is inspired by . We design and implement a tool that helps the user understand how the hidden layers in a deep CNN model work to classify examples. And the results are expressed in graphs which indicates sequential separations of the true class and the hypothesis. The main contributions of our method is as follows:
Compared to LIME  which only provides local interpretation in specific regions of the feature space, our method provides global interpretation for any test instances in the whole feature space.
Compared to models which apply inherently interpretable algorithms, e.g. , our method has the advantage of not compromising the accuracy of the model to be interpreted. This produces more reliable interpretation.
Ii Related Work
To resolve the problems for “trusting a prediction” and “trusting a model”, two methods are proposed in  to explain individual predictions and understand a model’s behavior respectively: Local Interpretable Model-agnostic Explanations (LIME) and Submodular Pick LIME (SP-LIME). The main idea for LIME is to use inherently interpretable models g to interpret complex models f locally. They designed an objective function to minimize the unfaithfulness (when g is approximating f in a local area) and the complexity of g. Although it was stated in their paper that in the objective function g could be any interpretable models, they set g as sparse linear models in their paper. Based on the individual explanations generated by LIME, they design an submodular pick algorithm: SP-LIME to explain the model as a whole by picking a number of representative and non-redundant instances.
It was suggested in  that coverage, precision and effort should be used to evaluate the results of the model interpretation. Although LIME achieves high precision and low effort, the coverage is not clear. In other words, LIME is able to explain why a specific prediction is made using the weights of the local model g, but can’t indicate to what local region the explanation is faithful. To solve this problem, the Anchor Local Interpretable Model-Agnostic Explanations method (aLIME) was introduced in . In aLIME, the if-then rules are used instead of using the weights in a linear model to explain a specific prediction (as was executed in LIME). The idea is based on the Decision Sets algorithm from . These if-then rules are easy to comprehend and has good coverage.
It was pointed out that there is a trade off between interpretability and accuracy for machine learning algorithms. In terms of inherently interpretable models, rule-based models, e.g. Decision Trees and Decision Lists are often preferred, as they can find a balance between these two factors. Decision lists are usually considered more interpretable than decision trees, as they use the if-then-else statements with a hierarchy structure. But this structure reduces to some extent the interpretability, as to interpret an additional rule all previous rules should be reasoned about. Also new rules down the list are applied to much narrow feature spaces, which makes the multi-class classification difficult where the minority classes deserves equally good rules. This motivates the proposal of the Decision Sets algorithm in
, which produces the isolated if-then rules, where each rule could be an independent prediction. To realize this, an objective function takes into account both interpretability (expressed by precision and recall of rules) and accuracy (expressed by size, length, cover and overlap). They showed that solving the objective function is a NP-hard problem, and finds near-optimal solutions of it. However, Decision set’ accuracy only approaches random forest, and its expressive power just catches up with decision tree.
Another model agnostic explanation approach is the Black Box Explanations through Transparent Approximations (BETA), introduced in . Different from LIME which aims for local interpretation, BETA is a framework which attempts to produce global interpretation for any classifier which are treated as black box classifiers. Based on their previous work on Decision Sets, the authors designed a framework with two level decision sets to taking into account fidelity (faithfulness to the black box model), unambiguity (single and deterministic explanations for each instance), interpretability (complexity minimized) and interactivity (user specified explorations of the feature’s subspace). In this two level structure, the outer if-then rules are the “neighborhood descriptors” and the inner if-then rules are “decision logic rules” (how the black box model labels an instance under the outer if-then rules). Similar to , an objective function is built and near-optimal solutions are found.
Our methodology could be classified as post-hoc interpretation , where a trained model is given and the main task is to interpret it. This method is close to the second approach mentioned in the fourth paragraph of the introduction section, but is also different in many ways. First, the model to be interpreted here is not treated as a black box as we directly interpret the hidden layers of a deep CNN. Second, compared to LIME which only has local interpretability, our method achieves global interpretability. Similar to LIME, we also provide qualitative interpretation with graphs to visualize the results. As our method interprets deep CNN via Meta-learning, we first briefly introduces deep CNN, meta-learning and then discuss our framework in details.
Iii-a Deep Convolutional Neural Network
This section introduces the deep CNN model we are going to interpret. As we implement our program using TensorFlow, we use its TensorBoard function to draw the structure of the deep CNN we construct in Fig. 1. Deep CNN is now the most advanced machine learning algorithm for image classification. It takes advantage of the two-dimensional structure of the input images. It uses a set of filters to filter the pixels of the raw input images to generate higher level representations to be learnt by the model in order to improve the performance.
There are three major components of deep CNN: convolutional layer, pooling layer and fully connected layer (same as in regular neural networks). A deep CNN model is usually a stack of these layers. In the convolutional layer, a filter is used to compute dot products between the pixels of the input image at specific position and the values of the filter, producing one single value in the output feature map. The convolution operation is completed after the filter is slided across the width and height of the input image. Following the convolutional layer, an activation function, often a rectified linear unit (ReLU)
, is applied to inject nonlinearities into the model and speed up the training process. Following ReLU is the pooling layer which is a non-linear down-sampling layer. A common algorithm for pooling is the max pooling algorithm. In this algorithm, each sub-region of the previous feature map is turned into a single maximum value in this region. Max pooling reduces computation and controls overfitting. In order to calculate the predicted class, after performing max pooling, the feature map needs to be flattened and feed into a fully connected layer. In the last layer: the output layer, a softmax classifier is applied for prediction.
The structure of the deep CNN model we designed is illustrated in Fig. 1, “Placeholder” represents for the interface to input the training data. “Reshape” is needed first to convert the input one-dimensional image data into two dimensional data. In our experiment, we use the MNIST dataset. The 784 input features are converted into a two-dimensional image. Our model has two series of a convolutional layer followed by a pooling layer: “conv1”-“pool1”-“conv2”-“pool2”, which are followed by one fully connected layer “fc1”. As a fully connected network is susceptible to suffer from overfitting, the “dropout” operation
applied after “fc1” aims to reduce it. In this operation, a probability parameterp
is set to keep a specific neuron with probabilityp (or drop it with probability 1-p). The “Adam optimizer”
, rather than a standard Stochastic Gradient Descent optimizer is used to train the model via modifying the variables and reducing the loss. “fc2” is the output layer with 10 neurons: each represents the class 0-9.
Meta-learning is an ensemble learning method which learns from the results of the base classifiers. It has a two-level structure, where the algorithms used in the first level are called base-learners and the algorithm in the second level is called meta-learner. The base-learners are trained on the original training data. The meta-learner is trained by the predictions of the base classifiers and the true class of the original training data. When training the meta-learner, the “Class-combiner” strategy  is applied here, where the predictions includes just the predicted class (instead of all classes, as in the “Binary-class-combiner”).
To understand the meta-learning algorithm intuitively, Fig. 2 illustrates a simplified training process for meta-learning . The numbers 1, 2, 3, 4 represents the four steps of training. In the 1st step, the base learning algorithms 1 to m are trained on the training data. In the 2nd step, a validation dataset is used to test the trained classifiers 1 to m. In the 3rd step, the predictions generated in step 2 and the true labels of the validation dataset are used to train a meta-learner. Finally, in the 4th step, a meta-classifier is produced and the whole meta-learning training process is completed.
Once the training process is accomplished, the test process is much easier to execute. Fig. 3 presents a simplified test process . In the 1st step, the test data is applied on the base classifiers to generate predictions which combined with the true labels of the test data comprises the meta-level test data in 2nd step. In the 3rd step, the final predictions are generated by testing the meta-level classifier with the predictions in the 2nd step and the accuracy could be calculated.
Our framework is named as CNN-INTE which stands for Convolutional Neural Network Interpretation. It is similar to meta-learning, but different in a few ways. In this work, we interpret the first fully connected layer “fc1” of the deep CNN model illustrated in Fig. 1.
The training process is shown in Fig. 5. In the 1st step, the original training data is used to train a CNN model. In the 2nd step, the parameters generated in the 1st step are used to calculate the values for the activations of the first fully connected layer: fc1. In the 3rd step, a clustering algorithm is used to cluster the data generated in step 2 into a number of groups which we define as factors henceforth. In the 4th step, the data belonging to each of the factors are clustered again generating a number of clusters each assigned a unique ID. In the 5th step, these IDs are grouped together as the training features in the meta-level, using the labels of the original training data as label for the meta-learner. In the 6th step, the features of the original training data and the IDs (set as labels) in step 4 are used to train a number of random forests .
Assume the training data has numbers of instances and layer “fc1” has neurons. The labels of the training data are . Once the deep CNN model is trained, for each training instance , we calculate the activations at each hidden neuron on this layer. Hence, we obtain a matrix with size . To construct the meta-level training data, we use a clustering algorithm to cluster the matrix along the hidden layer axis into several factors . Then within each of the factors, we cluster the data again, this time along the axis of the instances. The clustering results are the IDs each instance belongs to. For instance, if there 10 clusters, after the second level clustering each instance will have an ID between 0-9. All the IDs combined with the true labels of the training data builds up the meta-level training data.
To present the technical details of the CNN-INTE training process, we provide the pseudo code in Algorithm 1. Line 1-3 is the initialization of the algorithm. In line 4, the activations are clustered into factors, where is the number of clusters set in the clustering algorithm . In lines 5-7 the same clustering algorithm is applied on all the factors to generate sets of ID numbers. Lines 8-9 uses the generated ID numbers and the true labels of the original training data to train the meta-learner: . Till now, the training process is not done yet. We still need to generate the base models to be used in the test process. Lines 10-12 uses the features of the original training data and the ID numbers to train base models. The output of the training process would be the meta-lever classifier: and base models: .
Fig. 5 is a toy example that illustrates the above process. In this example, there are 5 hidden neurons and 6 training instances. We set the number of clusters for both the first and second level clustering as 3. Hence, the matrix with size is first clustered into 3 factors horizontally. For each factor, the activations are again clustered into three clusters vertically, e.g. is clustered into . If we set the ID numbers for these cluster as , then the corresponding ID numbers for to in factor according to Fig. 5 are . Hence, the meta-level training features are expressed as
This data combined with the corresponding training labels of the original training data is used to train the meta-learner. Here the meta-learner we used is the Decision Tree , an inherently interpretable algorithm. Its tree structure provides an excellent visual explanation of the predictions.
The test process of the meta-model is exactly the same as the meta-learning test process, which is shown in Fig. 6. In the test process, we use the original test data to test the base classifiers generated in the meta-level training process to obtain the meta-level test data’s features. The base-learner we applied is random forest. The number of base models is equal to the number of factors. Hence, we have base models: . In the toy example, there are three factors which lead to three base models. The training data for the first base model corresponding to would be
Here represents for the features of each original training instance. Once we obtain the base models, we can use the original test data to test them to produce the meta-level test data. These data are then feed into the trained decision tree model to interpret individual test predictions.
The dataset we use is the MNIST database of handwritten digits from 0 to 9. We extracted 55,000 examples (the original dataset has 60,000 examples for training) as the training data and 10,000 examples as the test data. Each of the examples represents for the images with pixels flattened as 784 features. The experiments are performed on the TensorFlow platform.
Iv-a Experimental Setup
First of all, we need to train a nice deep CNN model. We first reshape the input training data into 55000 images each with size
. Training all the data on every epoch is expensive, which requires lots of resources of the computer and may lead to the termination of the program. Here we apply stochastic training: on the first epoch, we select a mini-batch of the training data and perform optimization on this batch; once we loop through all the batches, we randomize the training data and start a new epoch. In our experiment, we set the epoch, batch size . Stochastic training is cheap and achieves similar performance to using the whole training data in every epoch. For each mini-batch, in the first convolutional layer, we apply 32 filters (or kernels) each with size , which generates 32 feature maps. In the first pooling layer we apply filters with size
. The stride size is set as 2. The second convolutional layer use 64 filters with the same size as the first convolutional layer. The second pooling layer has the same parameters as the previous one. Immediately after this pooling layer is the first fully connected layer. We set the number of neurons for this layer as 128. To reduce overfitting we also set the dropout parameter , which means a neuron’s output has 50% probability to be dropped. The last layer is the second fully connected layer (or the “readout layer”), which has 10 neurons with each neuron outputs the probability of the corresponding digits 0-9. The test accuracy of this trained CNN model on the test data is 93.9%.
Now comes the key part for setting up interpretation. we define interpretability of a model as the ability to provide visual or textual presentation of the connections between input features and the output predictions. We first feed the trained fully connected layer with the original training data, which would produce a data with size of . We then cluster
into several factors. The clustering algorithm we applied is the k-means algorithm. The number of factors is equal to the number of clusters which we set as 8 in this level. Hence, is now turned into a list with size having each row representing the data belonging to each factor. In the second level clustering, for each factor in we use the k-means algorithm again to cluster them into a number of clusters. We set the number as 10 in our experiment as the number of classes for the original training data is 10. Hence each cluster will be assigned a unique ID number between 0 and 9. Then we use the IDs belonging to each training instances and the true labels of the original training data to train a decision tree algorithm. Due to the limitation of the space, we are unable to show the structure of the trained decision tree here. We set the maximum depth of the decision tree as 5. Although deeper decision tree would generate better accuracy, it makes it harder to interpret with too many tree levels.
To obtain the test data for decision tree, we first use the original training data’s feature as features and the IDs for each factor in as labels to train the corresponding random forest algorithm, generating 8 base models. For random forest, we set the number of trees as 20 and the maximum nodes as 2000. Finally we use the original test data to test the 8 trained base models. The generated predictions become the features of meta-level test data with sizes of . Using the meta-level test data on the trained decision tree produces an accuracy of 92.8% with tree depth=5. This value is comparable to the test accuracy on the trained deep CNN model: 93.9%. It should be noted that the decision tree’s accuracy could be further improved by increasing the depth of tree and tuning other related parameters.
Iv-B Experimental Results
To interpret the deep CNN model’s behavior on the test data, we intend to use diagrams generated by our tool: CNN-INTE to examine individual predictions on the test data. Hence, we provide qualitative interpretations visually. We arbitrarily selected two test instances that were correctly classified by the decision tree and one test instance that was wrongly classified. It should be noted that this tool could be used on any test instances globally and not just limited to the three cases we provide. The details of the selected test instances are shown in Table I. Here “-” represents the features of the meta-level test data, “label” is the test label in the original test data, “pred” is the prediction generated by the decision tree on the meta-level test data. “True1” and “True2” represents for the two correctly classified instances and “Wrong1” is the wrongly classified instance.
|Features and labels|
In order to examine the classification process visually, we check each feature values according to the trained structure of the decision tree and plot the graphs of the activations corresponding to the true label and the hypothesis. The interpretation result for instance “True1” is shown in Fig. 7. As the true label for this instance is 3, all other classes could be regarded as hypothesis and this is why there are no graphs for “Hypothesis: 3” in Fig. 7. Each row represents the examination of the feature values corresponding to different factors in different levels of the trained decision tree, e.g. the first row represents the root level of the decision tree. Since we set the depth of the decision tree as 5, there are 5 rows in all. Each column stands for the query of if the test instance belongs to the corresponding hypothesis over the nodes visited.
Take the column of “Hypothesis:0” as an example, the goal is to find if the label of the test instance is 0. In the 1st row we extract the activations corresponding to “” which satisfies the condition that (this is determined by the trained decision tree) and draw a graph between activations that belongs to label=0 (hypothesis) and label=3 (true). Then we check the graph to evaluate if the data corresponding to the true class could be separated from the hypothesis. The answer is no because the hypothesis represented as blue points overlaps with the true class shown as red points. Hence, we need to query the trained decision tree further. The values of the factors we need to check is: for 2nd row; for 3rd row; for 4th row; for 5th row. In this process, we noticed that in the 4th row the true class and the hypothesis class are successfully separated as only the red points corresponding to the true label are left.
Therefore, we don’t need to examine further and that’s why the graph for the 5th row is not displayed. We highlight the graph with green rectangles if the final results are separable and red vice versa. The same idea is applied on other hypothesis. We also draw the graphs for instances “True2” and “Wrong1” in Fig. 8 and Fig. 9 respectively.
V Conclusion and future work
In this work, we present an interpretation tool CNN-INTE, which interprets a hidden layer of a deep CNN model: to find out how the learned hidden layer classifies new test instances. Although we just show the results for the first fully connected layer before the read-out layer, the approach could be applied on any hidden layers. The interpretation is realized by finding the relationships between the original training data and the trained hidden layer “fc1” via meta-learning. We used two-level k-means clustering algorithm to find the meta-level training data and random forests as base models for generating meta-level test data. The visual results generated by our program clearly indicate why a test instance is truly or wrongly classified by checking if there are any overlaps of the corresponding activations. For future work, we plan to initiate quantification of the interpreted results. In our experiments, one of the things we find tricky is the setting of the number of clusters for the k-means algorithm. In the future, we plan to replace the k-means algorithm with DBSCAN  which doesn’t need specifying the number of clusters. As stated in , “decision sets” seems to be a better option than decision tree as a inherently interpretable algorithm, so we also plan to replace decision tree with decision sets. Last but not least, it would be quite meaningful to apply this tool on real world applications where interpretations are demanded either between the training data and the hidden layer or between the hidden layer and the predictions.
The authors acknowledge the support of the Province of Nova Scotia, of Dalhousie University, and of the the Natural Sciences and Engineering Research Council of Canada under the CREATE program grant.
-  J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative filtering recommendations,” In Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp. 241–250,December 2000.
-  M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, L. G. Pierce, and H. P. Beck. “The role of trust in automation reliance,” Int. J. Hum.-Comput. Stud., vol.58, no.6, pp.697–718, 2003.
-  J. Ross Quinlan, C4. 5: programs for machine learning, Elsevier, 2014.
-  R. L. Rivest, “Learning decision lists,” Machine learning, vol.2, no.3, pp.229–246, 1987.
-  H. Lakkaraju, S. H. Bach, and J. Leskovec, “interpretable decision sets: A joint framework for description and prediction,” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1675–1684, ACM, August, 2016.
-  M. T.Ribeiro, S.Singh and C.Guestrin, “Why should i trust you? : Explaining the predictions of any classifier,” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, ACM, August, 2016.
-  H. Lakkaraju, E.Kamar, R.Caruana and J.Leskovec, “Interpretable & Explorable Approximations of Black Box Models,” KDD’17 workshop, 2017.
-  J. J.Thiagarajan, B.Kailkhura, P. Sattigeri , and K. N.Ramamurthy, “TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning,” 30th Conference on Neural Information Processing Systems (NIPS), 2016.
-  M.Abadi,et al. “TensorFlow: A System for Large-Scale Machine Learning,” In OSDI, Vol. 16, pp. 265–283, 2016.
-  M. T.Ribeiro, S.Singh and C.Guestrin, “Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance,” 30th Conference on Neural Information Processing Systems (NIPS), 2016.
-  M. Craven and J. W. Shavlik, “Extracting tree-structured representations of trained networks,” In Advances in neural information processing systems, pp. 24–30, 1996.
-  I. Goodfellow, Y. Bengio and A. Courville, Deep learning, MIT press, 2016.
-  P. K. Chan and S. J. Stolfo, “Experiments on multistrategy learning by meta-learning,” In Proceedings of the second international conference on information and knowledge management, pp. 314–323, ACM, December, 1993.
-  G. Montavon, W. Samek, and K.R. Müller, “Methods for interpreting and understanding deep neural networks,” Digital Signal Processing, 2017.
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86(11), pp. 2278–2324, November, 1998.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  X. Liu, X. Wang, S. Matwin, and N. Japkowicz, “Meta-learning for large scale machine learning with MapReduce,” In Big Data, 2013 IEEE International Conference on, pp. 105–110, IEEE, October, 2013.
-  J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” Journal of the Royal Statistical Society, Series C (Applied Statistics), 28(1), pp. 100-108, 1979.
-  A. Liaw and M. Wiener, “Classification and regression by randomForest,” R news, 2(3), pp. 18–22, 2002.
-  M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” In Kdd, Vol. 96, No. 34, pp. 226-231, August, 1996.