Local Interpretation Methods to Machine Learning Using the Domain of the Feature Space

by   Tiago Botari, et al.
Universidade de São Paulo

As machine learning becomes an important part of many real world applications affecting human lives, new requirements, besides high predictive accuracy, become important. One important requirement is transparency, which has been associated with model interpretability. Many machine learning algorithms induce models difficult to interpret, named black box. Moreover, people have difficulty to trust models that cannot be explained. In particular for machine learning, many groups are investigating new methods able to explain black box models. These methods usually look inside the black models to explain their inner work. By doing so, they allow the interpretation of the decision making process used by black box models. Among the recently proposed model interpretation methods, there is a group, named local estimators, which are designed to explain how the label of particular instance is predicted. For such, they induce interpretable models on the neighborhood of the instance to be explained. Local estimators have been successfully used to explain specific predictions. Although they provide some degree of model interpretability, it is still not clear what is the best way to implement and apply them. Open questions include: how to best define the neighborhood of an instance? How to control the trade-off between the accuracy of the interpretation method and its interpretability? How to make the obtained solution robust to small variations on the instance to be explained? To answer to these questions, we propose and investigate two strategies: (i) using data instance properties to provide improved explanations, and (ii) making sure that the neighborhood of an instance is properly defined by taking the geometry of the domain of the feature space into account. We evaluate these strategies in a regression task and present experimental results that show that they can improve local explanations.


page 9

page 10

page 11


Global Aggregations of Local Explanations for Black Box models

The decision-making process of many state-of-the-art machine learning mo...

How to Explain Individual Classification Decisions

After building a classifier with modern tools of machine learning we typ...

Hybrid Decision Making: When Interpretable Models Collaborate With Black-Box Models

Interpretable machine learning models have received increasing interest ...

An Additive Instance-Wise Approach to Multi-class Model Interpretation

Interpretable machine learning offers insights into what factors drive a...

LioNets: Local Interpretation of Neural Networks through Penultimate Layer Decoding

Technological breakthroughs on smart homes, self-driving cars, health ca...

Interpretation of Neural Networks is Fragile

In order for machine learning to be deployed and trusted in many applica...

"The Human Body is a Black Box": Supporting Clinical Decision-Making with Deep Learning

Machine learning technologies are increasingly developed for use in heal...

1 Introduction

Machine learning (ML) algorithms have shown high predictive capacity for model inference in several application domains. This is mainly due to recent technological advances, increasing number and size of public dataset repositories, and development of powerful frameworks for ML experiments 1, 2, 3, 4, 5, 6 Application domains where ML algorithms have been successfully used include image recognition 7

, natural language processing

8 and speech recognition 9. In many of these applications, the safe use of machine learning models and the users’ right to know how decisions affect their life make the interpretability of the models a very important issue. Many currently used machine learning algorithms induce models difficult to interpret and understand how they make decisions, named black boxes.

This occurs because several algorithms produce highly complex models in order to better describe the patterns in a dataset.

Most ML algorithms with high predictive performance induce black box models, leading to inexplicable decision making processes. Black box models reduce the confidence of practitioners in the model predictions, which can be a obstacle in many real world applications, such as medical diagnostics 10, science, autonomous driving 11, and others sensitive domains. In these applications, it is therefore important that predictive models are easy to interpret.

To overcome these problems, many methods that are able to improve model interpretation have been recently proposed; see e.g. 12, 13 for details. These methods aim at providing further information regarding the predictions obtained from predictive models. In these methods, interpretability can occur at different levels: (i) on the dataset; (ii) after the model is induced; and (iii) before the model is induced 14. We will focus our discussion on methods for model interpretability that can be applied after the induction of a predictive model by a ML algorithm; these are known as agnostic methods.

Figure 1:

(a) An example where a linear regression of the original features would provide little information regarding the model prediction. The blue continuous line represents the predictive model output as a function of the input, and the red circles represent two critical points of the curve. A local linear regression of the original feature space will produce a limited explanation in the neighborhood of the two critical points. (b) Representation of a domain of a two-dimensional feature problem where the plane defined by the two features is not fully covered. A local sampling can be used to create explanations on the neighborhood of the instance (red circle) that belongs to the correct task domain (blue region) (i.e., the intersection of the orange circle with the blue region) rather than on the orange circle.

Model-agnostic interpretation methods are a very promising approach to solve the problem of trust and to uncover the full potential of ML algorithms. These methods can be applied to explain predictions made by models induced by any ML algorithm. Some well known model-agnostic interpretation methods are described in 15, 16, 17, 18, 19. Perhaps the most well known interpretation method is LIME 17, which allows local explanations for classification and regression models. LIME has been shown to present a very good capability to create local explanations. As a result, LIME has been used to interpret models induced by ML algorithms in different application domains. However, it it still not clear how to make some decisions when implementing and applying LIME and related methods. Some questions that arise are:

  1. How to best define the neighborhood of an instance?

  2. How to control the trade-off between the accuracy of the interpretation model and its interpretability?

  3. How to make the obtained solution robust to small variations on the instance to be explained?

A good local explanation for a given instance needs to have high fidelity to the model induced by a ML algorithm in the neighborhood of . Although this neighborhood is typically defined in terms of Euclidean distances, ideally it should be supported by the dataset. Thus, the sub-domain used to fit the local explanation model (i.e., a model used to explain the black box model) should reflect the domain where the black model model was induced from. For instance, high-dimensional datasets often lie on a submanifold of , in which case defining neighborhoods in terms of the Euclidean distance is not appropriate 20, 21, 22, 23. To deal with this deficiency, we address issue (i) by creating a technique that samples training points for the explanation model along the submanifold where the dataset lies on (as opposed to Euclidean neighborhoods). We experimentally show that this technique provides a solution to (iii).

In order to address (ii), we observe that a good local explanation is not necessarily a direct map of the feature space. For some cases, the appropriate local description of the explanation lies on specific properties of the instance. These instance properties can be obtained through a transformation of the feature space. Thus, we address issue (ii) by creating local explanations on a transformed space of the feature space. This spectrum of questions should be elaborated by the specialists of the specific application domain.

In this work, we focus on performing these modifications for regression tasks. However, these modifications can be easily adapted for classification tasks. In Section 2.1, we discuss the use of instance properties, how to deal with the trade-off between explanation complexity and the importance of employing a robust method as an explanatory model. In Section 2.2, we describe how to improve the local explanation method using the estimation of the domain of feature space. In Section 3, we apply our methodology to a toy example. Finally, Section 4 presents the main conclusions from our work and describes possible future directions.

2 Model Interpretation Methods

2.1 Local Explanation Through Instance Properties

A crucial aspect for providing explanations to predictive models induced by ML algorithms is the relevant information to the specific knowledge domain. In some cases, a direct representation of the original set of features of an instance does not reflect the best local behavior of a prediction process. Hence, other instance properties can be used to create clear decision explanations. These properties can be generated through a map of the original features space, i.e., a function of the input . Moreover, these instance properties can increase the local fidelity of the explanation with the predictive model. This can be easily verified when the original feature space is highly limited and providing poor information on the neighborhood of a specific point. This case is illustrated by Figure 1 (a).

In order to provide a richer environment to obtain a good explanation, the interpretable model should be flexible to possible questions that an user want to instigate the ML model. Given that the possible explanations are mapped using specific functions of the feature space, we can create an interpretable model using


where x

represents the original vector of features,

are the coefficients of the linear regression that will be used as an explanation, and are known functions that map x to the properties (that is, questions) that have a meaningful value for explaining a prediction, or that are necessary to obtain an accurate explanation.

Once ’s are created, the explainable method should choose which of these functions better represent the predictions made by the original model locally. This can be achieved by introducing an

regularization in the square error loss function. More precisely, let

be a black-box model induced by a ML algorithm and consider the task of explaining the prediction made by at a new instance . Let be a sample generated on a neighborhood of . The local explanation can be found by minimizing (in )


where the first term is the standard square error between the induced model and the explanatory model and the second term is the penalization over the explanatory terms. The value of can be set to control the trade-off among the explanatory terms. For instance, if some explanatory terms () are more difficult to interpret, then a larger value can be assigned to .

In order to set the objective function (Equation 2), one must be able to sample in a neighborhood of . To keep consistency over random sampling variations on the neighborhood of , we decided to use a linear robust method that implements the regularization (see 24). This robust linear regression solves some of the problems of instability of local explanations 25.

Additionally, a relevant question is how to define a meaningful neighborhood around . In the next section we discuss how this question can be answered in an effective way.

Figure 2: A graphical bi-dimensional representation of the spiral toy model described by Equation 3.1. (a) Original data where the colors represent the target value (). (b) The domain of feature space (manifold), the blue points represent the original data, the pink polygon is the estimate of the manifold using -shape (), the black crosses represent the instances to be explained () (details in Section 3.1.1 - , and

), gray points represent a sample from a normal distribution around the

, and the red points correspond to the sample that belong to the estimated domain.

2.2 Defining meaningful neighborhoods

2.2.1 Feature Space

The training data used by a ML algorithm defines the domain of the feature space. In order to obtain a more reliable explanation model, we can use the estimated domain of the feature space for sampling the data needed to obtain this model via Equation 2, . This approach improves the fidelity and accuracy to the model when compared to standard Euclidean neighborhoods used by other methods 17. The estimation of the feature domain is closely related to the manifold estimation problem 26. Here, we show how this strategy works by using the -shape technique 27, 28 to estimate the domain of the feature space.

2.2.2 -shape

The -shape is a formal mathematical definition of the polytope concept of a set of points on the Euclidean space. Given a set of points and a real value , it is possible to uniquely define this polytope that enclose . The value defines an open hypersphere of radius . For , is a point, while for , is an open half-space. Thus, an -shape is defined by all -simplex, , defined by a set of points where there exist an open hypersphere that is empty, , and . In this way, the value controls the polytope details. For , the -shape recovered is the set of points itself, and for , the convex hull of the set is recovered 27, 28. We define the neighborhood of an instance to be the intersection of an Euclidean ball around and the space defined by polytope obtained from the -shape. In practice, we obtain the instances used in Equation 2 by sampling new points around that belong to the space defined by polytope obtained from the -shape.

Figure 3: Comparison of prediction performed by the explanation model and the true value of the spiral length using a data set not used during the induction of the model by a ML algorithm. The explanation model was generated for point . Figures (a) and (c) show the true label the explanation model prediction. The black line represents the perfect matching between the two values. Figures (b) and (d) show the importance of the features obtained by the explanation model. Normal sampling strategy ((a) and (b)): ; . Selected sampling ((c) and (d)): ; .
Figure 4: Comparison of prediction performed by the explanation model and the true value of the spiral length using a data set not used during training of the ML model. The explanation model was generated for point . Figures (a) and (c) show the true label the explanation model prediction. The black line represents the perfect matching between the two values. Figures (b) and (d) show the importance of the features obtained by the explanation model. Normal sampling strategy ((a) and (b)): ; . Selected sampling ((c) and (d)): ; .

3 Results for a Toy Model: Length of a Spiral

In this section, we present an application of our proposed methodology for a toy model in which the data is generated along a spiral. For such, we use the Cartesian coordinates of the spiral on the plane as features.

3.1 Definition

We explore the toy model described by


where and are the values that form the feature vector , is a independent variable, , , is a random noise, and the target value is given by

, the length of the spiral. This toy model presents some interesting features for our analysis, such as the feature domain over the spiral and the substantial variance of the target value when varying one of the features coordinate while keeping the other one fixed.

3.1.1 Instances for Investigation

We investigate the explanation for 3 specific instances of our toy model: , and . For the first point, , we have that the target value (the length of the spiral) will locally depend on the value of , and thus explanation methods should indicate that the most important feature is . For the second value, , the features and have the same contribution for explaining such target. Finally, for the third point, , the second feature should be the most important feature to explain the target.

3.1.2 Data Generation:

Using the model described in Equation 3.1, we generated thousand data points. These data was generated according to

, an uniform distribution. The values of random noise were selected from

and , where is a normal distribution with mean

and standard deviation

. The feature space and the target value are shown in Figure 2 (a). The generated data was split into two sets in which used for training and for testing. Additionally, we test the explanation methods by sampling three sets of data in the neighborhoods of , , and .

3.1.3 Model induction using a ML algorithm:

We used a decision tree induction algorithm (DT) in the experiments. We used the Classification and Regression Trees (CART) algorithm implementation provided by the scikit-learn

5 library. The model induced by this algorithm using the previously described dataset had as predictive performance and .

3.1.4 Determining the -shape of the data:

For this example, we applied the -shape technique using . The value of can be optimized for the specific dataset at hand; see 28 for details. The estimation of the domain using the -shape is illustrated by Figure 2 (b).

Figure 5: Comparison of prediction obtained by the explanation model and the true value of the spiral length using a data set not used during training of the ML model. The explanation model was generated for point . Figures (a) and (c) show the true label the explanation model prediction. The black line represents the perfect matching between the two values. Figures (b) and (d) show the importance of the features obtained by the explanation model. Normal sampling strategy ((a) and (b)): ; . Selected sampling ((c) and (d)): ; .

3.2 Local Explanation

The local explanation was generated though a linear regression fitted to a data generated over the neighborhood of the point for which the explanation was requested (). We use the linear robust method available on the scikit-learn package 5.

3.2.1 Explanation for instance :

The obtained explanation using the standard sampling approach (hereafter normal sampling) presents low agreement with true value of the spiral length (Figure 3(a)). We also noticed that this explanation is unstable with respect to sampling variations (even though we use a robust method to create the interpretation), and indicates that the best feature to explain the ML algorithm locally is (Figure 3(b)). This description is inaccurate (see discussion in Section Instances for Investigation). On the other hand, when the sampling strategy is performed over the correct domain of the feature space (hereafter selected sampling), we obtain an explanation method with high predictive accuracy (i.e., that accurately reproduces the true target value - Figure 3(c)). Moreover, the feature that best explains such prediction is (Figure 3(d)), which is in agreement with our expectation.

3.2.2 Explanation for instances and :

We also analyzed the other two points to demonstrate the capability of the selected sampling to capture the correct feature importance. For the instance , the features importance is almost equally divided between the two features (Figure 4). For the instance , the most important feature is , with importance of (figure 5). In the case of , the normal sampling strategy produced a good explanation (figure 5(b)). However, we noticed that this result is unstable due to random variation in the sampling. All results presented here are in agreement with our discussion in Section Instances for Investigation.

3.3 Robustness of Explanations

Good explanation models for should be stable to small perturbations around . To illustrate the stability of our method, we generated explanations for instances in the neighborhood of : , and . Table 1 shows that the explanations created for these points using selected sampling are compatible with those for . On the other hand, the normal sampling strategy is unstable. These results demonstrate that using the domain defined by the feature space can improve the robustness of a local explanation of an instance.

[HTML]EFEFEF point Importance Importance MSE R
0.0 14.5 -0.92 2.46 1.18 0.72
[HTML]EFEFEF -2.0 14.5 -1.07 1.87 6.19 0.64
1.0 14.0 -0.89 3.91 8.99 0.46
[HTML]EFEFEF 0.5 13.7 -0.95 1.47 1.09 0.93
Selected Sampling
[HTML]EFEFEF point Importance Importance MSE R
[HTML]FFFFFF 0.0 14.5 -0.96 0.33 0.19 0.95
[HTML]EFEFEF -2.0 14.5 -0.98 0.31 0.30 0.98
[HTML]FFFFFF 1.0 14.0 -0.97 0.07 0.21 0.99
[HTML]EFEFEF 0.5 13.7 -0.96 0.39 0.39 0.99
Table 1: Local explanations generated for instances around instance : for normal and selected sampling strategies. MSE and R measured between true values and predictions performed by the local explanation model.

4 Conclusion

In order to increase trust and confidence on black box models induced by ML algorithms, explanation methods must be reliable, reproducible and flexible with respect to the nature of the questions asked. Local agnostic-model explanations methods have many advantages that are aligned with these points. Besides, they can be applied to any ML algorithm. However, the standard of the existing agnostic methods present problems in producing reproducible explanation, while maintaining accuracy to the original model. To overcome these limitations, we developed new strategies to overcome them. For such, the proposed strategies address the following issues: (i) estimation of the domain of the feature space in order to provide meaningful neighborhoods; (ii) use of different penalization level on explanatory terms; and (iii) employment of robust techniques for fitting the explanatory method.

The estimation of the domain of the features space should be performed and used during the sampling step of local interpretation methods. This strategy increases the accuracy of the local explanation. Additionally, using robust regression methods to create the explainable models is beneficial to obtain stable solutions. However, our experiments show that robust methods are not enough; the data must be sampled taking the domain of the feature space into account, otherwise the generated explanations can be meaningless.

Future work includes testing other methods for estimating manifolds such as diffusion maps 29 and isomaps 30, extending these ideas to classification problems, and investigating the performance of our approach on real datasets.


The authors would like to thank CAPES and CNPq (Brazilian Agencies) for their financial support. T.B. acknowledges support by Grant 2017/06161-7, São Paulo Research Foundation (FAPESP). R. I. acknowledges support by Grant 2017/03363 (FAPESP) and Grant 306943/2017-4 (CNPq). The authors acknowledge Grant 2013/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry from São Paulo Research Foundation (FAPESP). T.B. thanks Rafael Amatte Bizao for review and comments.


  • 1 Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
  • 2 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
  • 3 Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013.
  • 4 Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  • 5 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • 6 Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer.

    Automatic differentiation in pytorch.

    In NIPS-W, 2017.
  • 7 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 1–9, 2015.
  • 8 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057, 2015.
  • 9 Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
  • 10 Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721–1730. ACM, 2015.
  • 11 Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  • 12 Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining explanations: An overview of interpretability of machine learning. In

    2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)

    , pages 80–89. IEEE, 2018.
  • 13 Christoph Molnar. Interpretable Machine Learning. 2019. https://christophm.github.io/interpretable-ml-book/.
  • 14 Zachary C Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016.
  • 15 Jerome H Friedman.

    Greedy function approximation: a gradient boosting machine.

    Annals of statistics, pages 1189–1232, 2001.
  • 16 Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong but many are useful: Variable importance for black-box, proprietary, or misspecified prediction models, using model class reliance. arXiv preprint arXiv:1801.01489, 2018.
  • 17 Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.

    ”why should I trust you?”: Explaining the predictions of any classifier.

    In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135–1144, 2016.
  • 18 Scott Lundberg and Su-In Lee. An unexpected unity among methods for interpreting model predictions. arXiv preprint arXiv:1611.07478, 2016.
  • 19 Erik Štrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3):647–665, 2014.
  • 20 Anil Aswani, Peter Bickel, Claire Tomlin, et al. Regression on manifolds: Estimation of the exterior derivative. The Annals of Statistics, 39(1):48–81, 2011.
  • 21 Ann B Lee and Rafael Izbicki. A spectral series approach to high-dimensional nonparametric regression. Electronic Journal of Statistics, 10(1):423–463, 2016.
  • 22 Rafael Izbicki and Ann B Lee. Nonparametric conditional density estimation in a high-dimensional regression setting. Journal of Computational and Graphical Statistics, 25(4):1297–1316, 2016.
  • 23 Rafael Izbicki and Ann B Lee. Converting high-dimensional regression to high-dimensional conditional density estimation. Electronic Journal of Statistics, 11(2):2800–2831, 2017.
  • 24 Art B Owen.

    A robust hybrid of lasso and ridge regression.

    Contemporary Mathematics, 443(7):59–72, 2007.
  • 25 David Alvarez-Melis and Tommi S Jaakkola. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049, 2018.
  • 26 Larry Wasserman. Topological data analysis. Annual Review of Statistics and Its Application, 5:501–532, 2018.
  • 27 Herbert Edelsbrunner, David Kirkpatrick, and Raimund Seidel. On the shape of a set of points in the plane. IEEE Transactions on information theory, 29(4):551–559, 1983.
  • 28 Herbert Edelsbrunner. Alpha shapes—a survey. Tessellations in the Sciences, 27:1–25, 2010.
  • 29 Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  • 30 Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.