Interpretability of Blackbox Machine Learning Models through Dataview Extraction and Shadow Model creation

by   Rupam Patir, et al.
IIIT Delhi

Deep learning models trained using massive amounts of data tend to capture one view of the data and its associated mapping. Different deep learning models built on the same training data may capture different views of the data based on the underlying techniques used. For explaining the decisions arrived by blackbox deep learning models, we argue that it is essential to reproduce that model's view of the training data faithfully. This faithful reproduction can then be used for explanation generation. We investigate two methods for data view extraction: hill-climbing approach and a GAN-driven approach. We then use this synthesized data for creating shadow models for explanation generation: Decision-Tree model and Formal Concept Analysis based model. We evaluate these approaches on a Blackbox model trained on public datasets and show its usefulness in explanation generation.



There are no comments yet.


page 1

page 2

page 3

page 4


Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital

When using machine learning techniques in decision-making processes, the...

A framework for the extraction of Deep Neural Networks by leveraging public data

Machine learning models trained on confidential datasets are increasingl...

Explaining the Predictions of Any Image Classifier via Decision Trees

Despite outstanding contribution to the significant progress of Artifici...

MaskIt: Masking for efficient utilization of incomplete public datasets for training deep learning models

A major challenge in training deep learning models is the lack of high q...

Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis

Deep learning models have been criticized for their lack of easy interpr...

Mass Personalization of Deep Learning

We discuss training techniques, objectives and metrics toward mass perso...

Reaching Data Confidentiality and Model Accountability on the CalTrain

Distributed collaborative learning (DCL) paradigms enable building joint...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With its use to make critical decisions in a wide range of industries from health to business, Machine Learning (ML) models are becoming more useful and significant by the day. Inevitably, the interpretability of these models has become a major focus as it is important to understand how the model arrives at its decisions. With sophisticated models such as Convolution Neural Networks (CNN) and Deep Neural Networks (DNN), the problem of interpretability becomes especially relevant when the user of the system is interested not only in the final decision but also in how and why it came to that decision. It is envisaged that interpretability is a promising approach for addressing challenges such as user trust issues due to algorithm aversion, data quality issues related to fairness and, to understand the reasoning behind decisions.

An example of such a challenge is in the health care sector where it is crucial to understand why an algorithm predicted a disease with respect to an individual patient. Not only does this ensure that the user using the system has a modicum of confidence that the system is looking at the right aspects of the patients record when coming to its decision, but it also gives the user a chance to identify new indicators for a disease. Another example is in the business sector where a marketeer would like to predict repeat buyers for a product. Knowing how a system came to its conclusion not only increases the marketeer’s confidence in the system but also gives valuable insight for critical business decisions.

Suresh and Guttag [ref14] have shown that building a ML system, M, can be viewed as a series of data transformations, wherein and are the real-world underlying features and labels respectively that M should ideally model. However, since the real-world data may be huge and may not always be completely available, a large sample and is taken to train the model M. During the construction of M itself, inevitably the features and labels are projected to some and respectively in the model due to the way the model parameters, representation of the features and labels are chosen and trained. Thus instead of the original function f: , what is learned is the function g: . This and , we call as the data view captured by the model.

In our paper, we focus on blackbox ML models wherein the model’s details such as its parameters, its mapping and representation details are not available. In such a scenario, we propose that it is important to faithfully extract the ‘data view’ captured in the model M in order to be able to build interpretable models for that blackbox ML model M. We define a notion of data view

of the target blackbox model, using the set of data objects correctly classified by the model. Our method extracts the data view via data synthesis to create a close approximation of the target model’s

data view.

We propose two different techniques for this step. The first technique, inspired by Hill-climbing, is a query synthesis technique that generates a dataset such that the output probability vectors has the least entropy as per the target model. The second technique, inspired by GAN, learns a model for the data generation such that the target labels classified using the blackbox model will have high accuracy. This is especially useful for synthesizing data for high dimensional datasets where the query synthesis technique becomes computationally expensive.

Once the data view is extracted, our approach aspires to create an interpretable Shadow model on that data view based on which the Blackbox interpretability is achieved. We build Decision Tree model based on this Data View. We also use Formal Concept Analysis techniques to dive deeper into the interpretation of the target model. These shadow models then provide the interpretable models for that blackbox model.

Our main contributions in this paper are as follows:

  • We show that the ‘data view’ extracted from a Blackbox model is a better reflection of the model’s behaviour, and that ‘data view’ can be synthesized from the model.

  • We study two synthesis methods for data view extraction – a Hill-climbing approach, and a new approach based on Generative Adversarial Networks (GAN).

  • Using this synthesized dataset, we can train interpretable models to generate explanations and interpret the original target model. Two interpretable models that we focus on are creating Decision Trees and interpretation via Formal Concept Analysis.

2 State-of-the-art

Various approaches for interpretability of blackbox models have been proposed [ref18, ref16, ref17, ref19]. Broadly, work on explainability can be classified into three types: a) Model-inspection methods, b) Shadow-model methods, and c) Data-based methods.

2.0.1 Model-inspection methods

Class Activation Maps (CAM) and Gradient-based Class Activation Maps (GradCAM) [ref20] inspect the deep network and compute a feature-importance map by associating the feature maps in the final convolutional layer with particular classes. In GradCAM the correlation of the gradients of each class w.r.t. each feature map is done by weights of activations of the feature maps as an indication of which inputs are most important.

2.0.2 Shadow-model methods

LIME [ref1] relies on building a locally linear shadow model for interpretability. There are broadly two types of interpretability approaches - Local and Global. Local interpretability involves building shadow models that reflect model’s view on a localised input space. Ribeiro et al [ref1] method called LIME (Locally Interpretable Model-Agnostic Explanations) assume that the complex learned function can be approximated by a set of locally linear models. Global interpretablity works on the entire input space, giving the target model’s global view on the data. Bastani et al [ref2]

, propose a shadow model approach but on a global scale wherein they approximate the target model in the form of an interpretable shadow model that is generated after fitting a Gaussian distribution to the training data.

2.0.3 Data-based methods

Lakkaraju et al [ref13] explain the global view using Decision Sets. Sangroya [ref12] have used a Formal Concept Analysis based method to provide data based explanability.

In the above approaches either the availability of training or validation data is essential or the details of the Target model is essential. However, in blackbox models the assumption of availability of training data or having the details of the model cannot be assumed. Thus our approach differs in that we extract the data view captured in the model and then build interpretable shadow models for the blackbox models.

3 Methodology

Our method takes a machine learning model M for which we have oracle access i.e. black box access where we have access to the input space f and the output vector of probabilities p. However, no information is available regarding the original data distribution and model parameters. We call the model M as the target model. To explain the target model we create an interpretable shadow model that simulates the functionality of the target model in terms of decision making over input data instances. We use the following steps for the same.

  1. Synthesize a data view to be used for training a shadow model.

  2. Train an interpretable shadow model on the synthesized data view.

  3. Transform the model into a set of rules.

3.1 Data View generation through data synthesis

As discussed earlier, a blackbox ML model M captures a particular ‘Data View’ of the original training data, and models the mapping from input features to labels/ predictions through some complex non-linear function. For a multi-class classification problem, this is viewed as classifying clearly positive instances in its appropriate class with some instances falling in the boundary regions between the classes. If we take only the clearly positive instances being classified in each of the classes with a confidence threshold of 0.7 and above and leave out all instances that are very close to the borders of the classes with less confidence, then a simpler interpretable shadow model can be constructed for the blackbox model. We argue that it is better to synthesize data for the ML model M, than to take the original training data, since the ML model M has captured a ‘data view’ from the original data. Thus it is more appropriate to extract out that data view and use it to train an interpretable shadow model S. We do this ‘data synthesis’ by generating data instances and posing these instances to M for classification / prediction, and considering only those instances that have a positive classification / prediction beyond a threshold of 70% by the model. The generated dataset then reproduces the data view of the model M.

We explore two different techniques for the data synthesis. The first technique generates a dataset whose output probability vectors has the least entropy as per the target model M. On the other hand the second technique learns a model for the data generation.

3.1.1 Synthesis using Hill Climbing Method.

The first technique uses a hill climbing algorithm to synthesize data D, which we will use to train our interpretable model S.

To build D we use a query synthesis algorithm as proposed in the Membership attack paper by Shokri et al [ref4]. Here, we use a function Synthesize-Record to generate one record for a class label c. It generates a record that assigns random values for each feature in the model’s input space. We then feed this record to the model M and get it’s class probability vector p. The algorithm accepts a record as part of our dataset D only if our model is confident beyond a threshold (conf) that the record belongs to a class c. If the record does not meet this threshold the algorithm randomly reassigns k features and repeat the process. Each time the record gets rejected, the algorithm reduces the value of k and repeat the process. If there are no more features left to be re-assigned, the algorithm discards the record and start again from the beginning. We repeat this process such that we have a significant amount of records per class label to train our interpretable model S. The algorithm also limits the number of features it reassigns with k and k. The algorithm uses these limitations to speed up the process and considers those records which will actually be admitted to the database. The algorithm reduces the number of features k in each revision so that the permutations are localised to the record being considered. The randomize function is specific to every dataset, where we randomize with respect to whatever knowledge we have of the input space.

3.1.2 Synthesis with GAN.

The Hill Climbing Method would be costly when synthesizing highly dimensional datasets. We propose a method based on GAN to generate synthetic data which can be used for both low and high dimensional datasets. We first design a neural net architecture to be used as a generator for each of the dataset mentioned in table 1

. The goal of this generator is to generate a data point given to any corresponding noise. The number of input and output neurons of the generator is

and number of features in the dataset respectively.

The steps for Synthesizing the data are as follows:

  1. For each class c we make a different generator that takes a random noise of size (depends on the dataset) as an input and generates a data sample as an output. Steps 2-4, as shown in algorithm 1, is repeated for each generator .

  2. For training the generator we replace what would normally be an input image and discriminator in a standard GAN model with our Black box model M as shown in the figure 1.

  3. In the Forward pass, we generate random noise and taking that as input we generate a synthesized record and then pass it to the Black box model M to get how close it is to the real record.

  4. In the Backward pass, first freeze the weights w of the Black box model M and then propagate the error generated at output of Black box model M. Using the error obtained at input layer of Black box model M, we update the weights w of the Generator by making it the error obtained at the output layer of the generator and back propagating it.

  5. After training the generator we can generate data by inputting any random noise of size and getting the output.

Result: Trained Generator for class c
1 Function Synthesize(,,,,,):
2       for  do
4       end for
Algorithm 1 Training Generator for generating data of class c
Figure 1: Flow Chart of the GAN Approach.

3.2 Shadow Model and its Fidelity

Now that we have our synthesized dataset, we use it to train our Interpretable model S which can be a Decision tree or any other easy-to-understand model.
To check the fidelity (similarity) of our shadow model i.e. our Interpretable model, we calculate the number of records that are predicted the same by both the Black Box model M and the Interpretable model S.


where, = 0 if , and = 1 if and n = total number of records.

3.3 Interpretability

If our shadow model and target model have a high level of fidelity we can be confident enough to assert that the easy-to-interpret shadow model is a close approximation of our hard-to-interpret target model. As such, with our shadow model we can now run commonly known interpretation techniques such as viewing it as a decision tree. We can also get an understanding of feature importance within the model. For example, with Decision trees, we can interpret the importance of features depending on how high up the tree the feature causes a split. Therefore, depending on the choice of the shadow model, we can run different methods to better interpret our target model thereby achieving our goal.
For feature importance we can use the method of Permutation Importance. Permutation Importance is a method of finding the important features in a classification model. It involves permuting the data to see which features have the largest effect on the accuracy of our model. For example, if permuting feature decreases our accuracy or results in a change of classification, we can imply that feature is important in the classification process of the class label of that record.
We also use Formal Concept Analysis (FCA) using techniques proposed by Sangroya et al [ref12] to carry our different analytical procedures. Specifically, we use FCA to produce implication rules and as another metric to evaluate the feature importance in the target model’s view.

4 Experimental Results

We consider Neural Networks as target models and learn them using Python’s Keras library for each of the datasets given in Table

1. For GAN approach the target model is trained on scaled data with values between -1 to 1. Our goal is to investigate the effect of using a data view of a target model on the fidelity of Shadow model. Does the use of the data-view of the target model result into better fidelity of a shadow model? To demonstrate this, we create two shadow models; first shadow model, , uses the original data that was used for training the target model, the second model, , uses the data view of the target model for its training. We demonstrate through experiments that the model outperforms consistently in terms of fidelity. The experimental results on fidelity and accuracy are given in Table 2. The table also shows the accuracy of model on the test data to get an idea of how much the shadow model matches with the target model in terms of generalizability. We also study the effect of different parameters on the fidelity of a shadow model for the Purchase dataset, which is explained further.

Figure 2: Fidelity of SShadow vs Number of records synthesized Dataset Name #objects #attributes # Type of Prediction Animal111 100 16 7 class classification Diabetes222 768 8 Probabilty score for Diabetes Mobile333 2000 20 Price range Income444 32561 107 2 class classification Purchase555 9600 [10,20,40,75]

Labeled classification by KNN

Table 1: Datasets used for expirments .

4.1 Datasets and Target Model Learning

To analyse our algorithms we use the datasets described in Table 1.

Income dataset originally had 14 features which increase to 107 after binarization of the categorical features for training the target Neural Network. The reason to use the Purchase dataset to study the effect of aforementioned parameters lies in its flexibility to define the number of classes and dimensions. Purchase dataset is a user-product table which we cluster to get the labels. The number of clusters defines the number of class labels. Similarly it is easy to also choose number of products in the database randomly that defines the number of dimensions.

We synthesize 10000 records for each class label for each of our datasets using two approaches: Hill Climbing and GAN. A good data view should have mainly two properties: i) it should span the input domain of the application as much as possible and, ii) it should include the core objects of each class. We find that for the studied datasets, 10000 records satisfy both of these properties. Furthermore, from Figure 2 we see that increasing the number of records does not negatively impact our fidelity. By default we use a Confidence Threshold of 70 percent for each of our synthesized datasets as this gives us records that the target model is fairly to very confident that the records belong to that class. Furthermore, high threshold values may limit the types of records to only those that model is very confident resulting in underfitting, whereas, low threshold values may permit records that have no valuable information about the model’s view.

4.2 Results

We summarise our results of fidelity for all the datasets in Table 2. With the Hill Climbing algorithm, we see that the fidelity is higher when a shadow model is trained on the target view. The only exception to this is the result for Purchase (30F & 2C) Dataset which in contrast to Purchase (20F & 5C) has lower SShadow model fidelity. It may be so because of the relatively higher number of features which affects the fidelity of our SShadow model. We can see from Table 2

that GAN algorithm does not work well for Mobile and Purchase datasets. This is due to the case that generator is not able to generate data with much varying confidence thus reducing the variance in the generated data. Also in purchase dataset the values of features is either 0 or 1 which leads to sparsity within data and thus generator is not able to learn the data view correctly.

Dataset Accuracy of Target Model Fidelity OShadow Fidelity SShadow) Accuracy of SShadow
Hill GAN Hill GAN
Animal 95.23
95.23 95.23 90.47 90.47
Diabetes 69.48
84.41 81.81 71.42 74.67
Mobile 94 84.75 92.25 62.74 90.25 64.25
Income 84.24
98.66 93.59 74.5 76.4
Purchase (30F & 2C) 96.35 95.36 85.17 71.07 93.70 70.97
Purchase (20F & 5C) 95.31 90.94 93.96 58.63 93.65 58.68
Table 2: Evaluating the accuracy and fidelity of our target and Shadow models on different datasets. S stands model trained for scaled data used for GAN approach
Classes Fidelity
Hill GAN
2 93.28 74.71
5 99.01 59.625
10 99.47 29.032
15 99.73 27.99
Table 4: Effect of the number of features
Features Fidelity
Hill GAN
10 100 100
20 85.01 72.94
30 81.11 71
40 75.91 73.08
50 67.58 72
75 52.13 71.59
Table 5: Time taken for synthesis per record (in seconds)
Dataset Time taken
Hill GAN
Diabetes 0.007 0.000075
Animal 0.05 0.000127
Purchase (30F2C) 0.02 0.00021
Mobile 0.01 0.0001
Income 0.03 0.00004
Table 3: Effect of the number of class labels

4.2.1 Effect of the number of Classes:

To check the effect of varying the number of classes, we use the Purchase dataset with 15 features and generate 1000 records with 70% confidence threshold. From Table 5, we see that an increase in the number of classes does not negatively affect fidelity. This is so because the amount of information leaked by the target model would not decrease with an increase in the number of classes. Even when we have more classes, each feature retains and simultaneously leaks information about the decision boundaries of those classes. As the number of classes increases, the decision boundary gets more complex and generator is not able to generate data with high confidence.

4.2.2 Effect of the number of Dimensions:

To check the effect of varying the number of dimensions, we use the Purchase dataset with 2 classes and generate 1000 records with 70% confidence threshold. From Table 5, we see that with the hill climbing method of data synthesis, increasing the dimensionality has a negative effect on fidelity. This is so because the synthesis process does not correctly capture the importance of features to the model’s view. For example, we may have more records varying in the unimportant features than in the important features giving us an unrepresentative view of the target model. On the other hand, there is not much effect on number of dimensions on GAN algorithm as our generator’s architecture is also changing with respect to the number of dimensions. For more number of dimensions, we tend to have a more complex model.

4.2.3 Effect of choice of synthesis process:

Although the GAN process of synthesizing data produces better Synthesized fidelity than our hill climbing algorithm (From table 2), there is less coverage of the input space by the synthesized data. To induce better coverage within the synthesized data, we train multiple generators as each generator gives a different view captured by our target model. The data view we get is dependent upon how the weights of the generator are initialized. In case of hill climbing approach as we randomly initialize each data point generated, we get much more coverage within the data generated.

As seen from Table 5, the time taken by the hill climbing approach is much more than the time taken by the GAN approach and also we can generate much more in using GAN approach as when a generator is fully trained we can get a new data point by inputting any random noise sample.

4.3 Interpreting our Shadow Models

For using our shadow models to carry out explanations and visualisations we can use a varied number of techniques. To show that we can use a varied number of shadow models we use the Diabetes Dataset.

4.3.1 Visualisation Using Decision Trees

With a target model trained on the diabetes dataset, we use our approach to train a shadow model in the form of a decision tree with which we can derive global or individual record decision rules. A snippet of the decision tree is shown in figure 3. From our decision tree we see that Glucose is the root note and therefore is the most important feature.

Figure 3: Snippet of Shadow Model in the form Decision Tree trained on Synthesized Data generated from diabetes dataset

4.3.2 Permutation Importance

We use the eli5666 library of Python to test Permutation Importance. On the Diabetes Dataset we found that the important features are the same in both our Target Model trained on the original data and our Shadow Model trained on the synthesized data. We find that the most important features are Glucose, BloodPressure, Insulin and Age.

4.3.3 Formal Concept Analysis

For carrying out our analysis, we binarize the features of our synthesized dataset as described in the paper by Sangroya et al [ref12]. For example, a binary feature Insulin1 corresponds to an insulin level below 16 mIU. We use the Concepts.py777 library in python to create formal concepts and the corresponding lattices for each class label using this data. From these lattices we carry out predictions based on the intents of the class lattices and intents of the individual records in our synthesized dataset. We present the feature importances in the model’s view as calculated using the FCA approach in table 7. We also use ConExp888 to derive the implication rules that are considered during a classification, some of which are shown in table 7.

Feature Diabetes No Diabetes Average
Glucose 92.19 35.29 63.74
Blood Pressure 78.13 27.21 52.67
Insulin 81.25 21.32 51.285
Age 70.31 22.79 46.55
Skin Thickness 46.88 19.85 33.365
BMI 40.63 20.59 30.61
Pregnancies 32.81 22.79 27.8
DPF 29.69 18.38 24.035
Table 7: Implication rules derived via Formal Concept Analysis
Rule Translation
Insulin2 Class0 An insulin level between 16 to 166 mIU L implies no diabetes
Insulin3, Glucose2, Age3 Class1 Insulin level above
166 mIU L and Glucose level 140-200 mg dL and an age above 60 implies diabetes
BP2, Age2 Class0 A blood pressure between 60 and 90 and an age between 20 and 60 implies no diabetes
Table 6: Feature Importance using Formal Concept Analysis

5 Analysis and Evaluation of Results

From our shadow models, we found that in most cases whether we used Decision Trees, Random Forest Regressors or Formal Concept Analysis

[ref12], the fidelity between the target model and the shadow model was better when trained with the synthesized data over the target model’s training data. This means that a shadow model is able to capture the view of a black box target model even without access to it’s training data when using the target model itself to generate data to train the shadow model. However, certain datasets were relatively resistant to our synthesis processes such as datasets with a high number of features for the Hill Climbing approach and datasets with a high number of classes for the GAN approach.

With our shadow models, we also saw the different methods of interpretation that can be carried out on this shadow model and correspondingly the target black box model. From datasets with a low number of features, a Decision Tree approach can be used where we can see the affect of each feature in the classification process. The problem of interpretability becomes harder when considering datasets with large number of features. But given that we have a well approximated shadow model, other forms of interperation such as feature importance and implication rules can be used to provide interpretable explanations. Our approach can therefore be used with a multitude of choices for the intepretable shadow model and the method of interpretation can vary depending on the dataset. With our approach the view captured by the shadow model will tend to be more faithful to the target model’s view and can therefore be used to explain and interpret the target model’s view of the data.

6 Conclusion

From our experiments we have shown that via the procedure of synthesizing our dataset using the target black box model’s predictions, we can create an approximation of the target model’s view of the data to interpret and explain it’s classification process. We also presented a synthesis process using a GAN that is useful for high dimensional data synthesis. We presented the different factors that affect the fidelity of our shadow models with each of our synthesis processes. Finally, we presented how we can use a variety of shadow model choices to interpret and understand our target model.