1 Introduction
Machine learning (ML) algorithms have shown high predictive capacity for model inference in several application domains. This is mainly due to recent technological advances, increasing number and size of public dataset repositories, and development of powerful frameworks for ML experiments ^{1, 2, 3, 4, 5, 6} Application domains where ML algorithms have been successfully used include image recognition ^{7}
^{8} and speech recognition ^{9}. In many of these applications, the safe use of machine learning models and the users’ right to know how decisions affect their life make the interpretability of the models a very important issue. Many currently used machine learning algorithms induce models difficult to interpret and understand how they make decisions, named black boxes.This occurs because several algorithms produce highly complex models in order to better describe the patterns in a dataset.
Most ML algorithms with high predictive performance induce black box models, leading to inexplicable decision making processes. Black box models reduce the confidence of practitioners in the model predictions, which can be a obstacle in many real world applications, such as medical diagnostics ^{10}, science, autonomous driving ^{11}, and others sensitive domains. In these applications, it is therefore important that predictive models are easy to interpret.
To overcome these problems, many methods that are able to improve model interpretation have been recently proposed; see e.g. ^{12, 13} for details. These methods aim at providing further information regarding the predictions obtained from predictive models. In these methods, interpretability can occur at different levels: (i) on the dataset; (ii) after the model is induced; and (iii) before the model is induced ^{14}. We will focus our discussion on methods for model interpretability that can be applied after the induction of a predictive model by a ML algorithm; these are known as agnostic methods.
Modelagnostic interpretation methods are a very promising approach to solve the problem of trust and to uncover the full potential of ML algorithms. These methods can be applied to explain predictions made by models induced by any ML algorithm. Some well known modelagnostic interpretation methods are described in ^{15, 16, 17, 18, 19}. Perhaps the most well known interpretation method is LIME ^{17}, which allows local explanations for classification and regression models. LIME has been shown to present a very good capability to create local explanations. As a result, LIME has been used to interpret models induced by ML algorithms in different application domains. However, it it still not clear how to make some decisions when implementing and applying LIME and related methods. Some questions that arise are:

How to best define the neighborhood of an instance?

How to control the tradeoff between the accuracy of the interpretation model and its interpretability?

How to make the obtained solution robust to small variations on the instance to be explained?
A good local explanation for a given instance needs to have high fidelity to the model induced by a ML algorithm in the neighborhood of . Although this neighborhood is typically defined in terms of Euclidean distances, ideally it should be supported by the dataset. Thus, the subdomain used to fit the local explanation model (i.e., a model used to explain the black box model) should reflect the domain where the black model model was induced from. For instance, highdimensional datasets often lie on a submanifold of , in which case defining neighborhoods in terms of the Euclidean distance is not appropriate ^{20, 21, 22, 23}. To deal with this deficiency, we address issue (i) by creating a technique that samples training points for the explanation model along the submanifold where the dataset lies on (as opposed to Euclidean neighborhoods). We experimentally show that this technique provides a solution to (iii).
In order to address (ii), we observe that a good local explanation is not necessarily a direct map of the feature space. For some cases, the appropriate local description of the explanation lies on specific properties of the instance. These instance properties can be obtained through a transformation of the feature space. Thus, we address issue (ii) by creating local explanations on a transformed space of the feature space. This spectrum of questions should be elaborated by the specialists of the specific application domain.
In this work, we focus on performing these modifications for regression tasks. However, these modifications can be easily adapted for classification tasks. In Section 2.1, we discuss the use of instance properties, how to deal with the tradeoff between explanation complexity and the importance of employing a robust method as an explanatory model. In Section 2.2, we describe how to improve the local explanation method using the estimation of the domain of feature space. In Section 3, we apply our methodology to a toy example. Finally, Section 4 presents the main conclusions from our work and describes possible future directions.
2 Model Interpretation Methods
2.1 Local Explanation Through Instance Properties
A crucial aspect for providing explanations to predictive models induced by ML algorithms is the relevant information to the specific knowledge domain. In some cases, a direct representation of the original set of features of an instance does not reflect the best local behavior of a prediction process. Hence, other instance properties can be used to create clear decision explanations. These properties can be generated through a map of the original features space, i.e., a function of the input . Moreover, these instance properties can increase the local fidelity of the explanation with the predictive model. This can be easily verified when the original feature space is highly limited and providing poor information on the neighborhood of a specific point. This case is illustrated by Figure 1 (a).
In order to provide a richer environment to obtain a good explanation, the interpretable model should be flexible to possible questions that an user want to instigate the ML model. Given that the possible explanations are mapped using specific functions of the feature space, we can create an interpretable model using
(1) 
where x
represents the original vector of features,
are the coefficients of the linear regression that will be used as an explanation, and are known functions that map x to the properties (that is, questions) that have a meaningful value for explaining a prediction, or that are necessary to obtain an accurate explanation.Once ’s are created, the explainable method should choose which of these functions better represent the predictions made by the original model locally. This can be achieved by introducing an
regularization in the square error loss function. More precisely, let
be a blackbox model induced by a ML algorithm and consider the task of explaining the prediction made by at a new instance . Let be a sample generated on a neighborhood of . The local explanation can be found by minimizing (in )(2) 
where the first term is the standard square error between the induced model and the explanatory model and the second term is the penalization over the explanatory terms. The value of can be set to control the tradeoff among the explanatory terms. For instance, if some explanatory terms () are more difficult to interpret, then a larger value can be assigned to .
In order to set the objective function (Equation 2), one must be able to sample in a neighborhood of . To keep consistency over random sampling variations on the neighborhood of , we decided to use a linear robust method that implements the regularization (see ^{24}). This robust linear regression solves some of the problems of instability of local explanations ^{25}.
Additionally, a relevant question is how to define a meaningful neighborhood around . In the next section we discuss how this question can be answered in an effective way.
2.2 Defining meaningful neighborhoods
2.2.1 Feature Space
The training data used by a ML algorithm defines the domain of the feature space. In order to obtain a more reliable explanation model, we can use the estimated domain of the feature space for sampling the data needed to obtain this model via Equation 2, . This approach improves the fidelity and accuracy to the model when compared to standard Euclidean neighborhoods used by other methods ^{17}. The estimation of the feature domain is closely related to the manifold estimation problem ^{26}. Here, we show how this strategy works by using the shape technique ^{27, 28} to estimate the domain of the feature space.
2.2.2 shape
The shape is a formal mathematical definition of the polytope concept of a set of points on the Euclidean space. Given a set of points and a real value , it is possible to uniquely define this polytope that enclose . The value defines an open hypersphere of radius . For , is a point, while for , is an open halfspace. Thus, an shape is defined by all simplex, , defined by a set of points where there exist an open hypersphere that is empty, , and . In this way, the value controls the polytope details. For , the shape recovered is the set of points itself, and for , the convex hull of the set is recovered ^{27, 28}. We define the neighborhood of an instance to be the intersection of an Euclidean ball around and the space defined by polytope obtained from the shape. In practice, we obtain the instances used in Equation 2 by sampling new points around that belong to the space defined by polytope obtained from the shape.
3 Results for a Toy Model: Length of a Spiral
In this section, we present an application of our proposed methodology for a toy model in which the data is generated along a spiral. For such, we use the Cartesian coordinates of the spiral on the plane as features.
3.1 Definition
We explore the toy model described by
(3)  
where and are the values that form the feature vector , is a independent variable, , , is a random noise, and the target value is given by
, the length of the spiral. This toy model presents some interesting features for our analysis, such as the feature domain over the spiral and the substantial variance of the target value when varying one of the features coordinate while keeping the other one fixed.
3.1.1 Instances for Investigation
We investigate the explanation for 3 specific instances of our toy model: , and . For the first point, , we have that the target value (the length of the spiral) will locally depend on the value of , and thus explanation methods should indicate that the most important feature is . For the second value, , the features and have the same contribution for explaining such target. Finally, for the third point, , the second feature should be the most important feature to explain the target.
3.1.2 Data Generation:
Using the model described in Equation 3.1, we generated thousand data points. These data was generated according to
, an uniform distribution. The values of random noise were selected from
and , where is a normal distribution with mean . The feature space and the target value are shown in Figure 2 (a). The generated data was split into two sets in which used for training and for testing. Additionally, we test the explanation methods by sampling three sets of data in the neighborhoods of , , and .3.1.3 Model induction using a ML algorithm:
We used a decision tree induction algorithm (DT) in the experiments. We used the Classification and Regression Trees (CART) algorithm implementation provided by the scikitlearn
^{5} library. The model induced by this algorithm using the previously described dataset had as predictive performance and .3.1.4 Determining the shape of the data:
For this example, we applied the shape technique using . The value of can be optimized for the specific dataset at hand; see ^{28} for details. The estimation of the domain using the shape is illustrated by Figure 2 (b).
3.2 Local Explanation
The local explanation was generated though a linear regression fitted to a data generated over the neighborhood of the point for which the explanation was requested (). We use the linear robust method available on the scikitlearn package ^{5}.
3.2.1 Explanation for instance :
The obtained explanation using the standard sampling approach (hereafter normal sampling) presents low agreement with true value of the spiral length (Figure 3(a)). We also noticed that this explanation is unstable with respect to sampling variations (even though we use a robust method to create the interpretation), and indicates that the best feature to explain the ML algorithm locally is (Figure 3(b)). This description is inaccurate (see discussion in Section Instances for Investigation). On the other hand, when the sampling strategy is performed over the correct domain of the feature space (hereafter selected sampling), we obtain an explanation method with high predictive accuracy (i.e., that accurately reproduces the true target value  Figure 3(c)). Moreover, the feature that best explains such prediction is (Figure 3(d)), which is in agreement with our expectation.
3.2.2 Explanation for instances and :
We also analyzed the other two points to demonstrate the capability of the selected sampling to capture the correct feature importance. For the instance , the features importance is almost equally divided between the two features (Figure 4). For the instance , the most important feature is , with importance of (figure 5). In the case of , the normal sampling strategy produced a good explanation (figure 5(b)). However, we noticed that this result is unstable due to random variation in the sampling. All results presented here are in agreement with our discussion in Section Instances for Investigation.
3.3 Robustness of Explanations
Good explanation models for should be stable to small perturbations around . To illustrate the stability of our method, we generated explanations for instances in the neighborhood of : , and . Table 1 shows that the explanations created for these points using selected sampling are compatible with those for . On the other hand, the normal sampling strategy is unstable. These results demonstrate that using the domain defined by the feature space can improve the robustness of a local explanation of an instance.
[HTML]FFFFFF[HTML]FFFFFF Normal Sampling  
[HTML]EFEFEF point  Importance  Importance  MSE  R  
0.0  14.5  0.92  2.46  1.18  0.72  
[HTML]EFEFEF  2.0  14.5  1.07  1.87  6.19  0.64 
1.0  14.0  0.89  3.91  8.99  0.46  
[HTML]EFEFEF  0.5  13.7  0.95  1.47  1.09  0.93 
Selected Sampling  
[HTML]EFEFEF point  Importance  Importance  MSE  R  
[HTML]FFFFFF  0.0  14.5  0.96  0.33  0.19  0.95 
[HTML]EFEFEF  2.0  14.5  0.98  0.31  0.30  0.98 
[HTML]FFFFFF  1.0  14.0  0.97  0.07  0.21  0.99 
[HTML]EFEFEF  0.5  13.7  0.96  0.39  0.39  0.99 
4 Conclusion
In order to increase trust and confidence on black box models induced by ML algorithms, explanation methods must be reliable, reproducible and flexible with respect to the nature of the questions asked. Local agnosticmodel explanations methods have many advantages that are aligned with these points. Besides, they can be applied to any ML algorithm. However, the standard of the existing agnostic methods present problems in producing reproducible explanation, while maintaining accuracy to the original model. To overcome these limitations, we developed new strategies to overcome them. For such, the proposed strategies address the following issues: (i) estimation of the domain of the feature space in order to provide meaningful neighborhoods; (ii) use of different penalization level on explanatory terms; and (iii) employment of robust techniques for fitting the explanatory method.
The estimation of the domain of the features space should be performed and used during the sampling step of local interpretation methods. This strategy increases the accuracy of the local explanation. Additionally, using robust regression methods to create the explainable models is beneficial to obtain stable solutions. However, our experiments show that robust methods are not enough; the data must be sampled taking the domain of the feature space into account, otherwise the generated explanations can be meaningless.
Future work includes testing other methods for estimating manifolds such as diffusion maps ^{29} and isomaps ^{30}, extending these ideas to classification problems, and investigating the performance of our approach on real datasets.
Acknowledgments
The authors would like to thank CAPES and CNPq (Brazilian Agencies) for their financial support. T.B. acknowledges support by Grant 2017/061617, São Paulo Research Foundation (FAPESP). R. I. acknowledges support by Grant 2017/03363 (FAPESP) and Grant 306943/20174 (CNPq). The authors acknowledge Grant 2013/073750  CeMEAI  Center for Mathematical Sciences Applied to Industry from São Paulo Research Foundation (FAPESP). T.B. thanks Rafael Amatte Bizao for review and comments.
References
 1 Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
 2 J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. ImageNet: A LargeScale Hierarchical Image Database. In CVPR09, 2009.
 3 Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013.
 4 Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
 5 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

6
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer.
Automatic differentiation in pytorch.
In NIPSW, 2017. 
7
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich.
Going deeper with convolutions.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 1–9, 2015.  8 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057, 2015.
 9 Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
 10 Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1721–1730. ACM, 2015.
 11 Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for selfdriving cars. arXiv preprint arXiv:1604.07316, 2016.

12
Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and
Lalana Kagal.
Explaining explanations: An overview of interpretability of machine
learning.
In
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
, pages 80–89. IEEE, 2018.  13 Christoph Molnar. Interpretable Machine Learning. 2019. https://christophm.github.io/interpretablemlbook/.
 14 Zachary C Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016.

15
Jerome H Friedman.
Greedy function approximation: a gradient boosting machine.
Annals of statistics, pages 1189–1232, 2001.  16 Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong but many are useful: Variable importance for blackbox, proprietary, or misspecified prediction models, using model class reliance. arXiv preprint arXiv:1801.01489, 2018.

17
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.
”why should I trust you?”: Explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 1317, 2016, pages 1135–1144, 2016.  18 Scott Lundberg and SuIn Lee. An unexpected unity among methods for interpreting model predictions. arXiv preprint arXiv:1611.07478, 2016.
 19 Erik Štrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3):647–665, 2014.
 20 Anil Aswani, Peter Bickel, Claire Tomlin, et al. Regression on manifolds: Estimation of the exterior derivative. The Annals of Statistics, 39(1):48–81, 2011.
 21 Ann B Lee and Rafael Izbicki. A spectral series approach to highdimensional nonparametric regression. Electronic Journal of Statistics, 10(1):423–463, 2016.
 22 Rafael Izbicki and Ann B Lee. Nonparametric conditional density estimation in a highdimensional regression setting. Journal of Computational and Graphical Statistics, 25(4):1297–1316, 2016.
 23 Rafael Izbicki and Ann B Lee. Converting highdimensional regression to highdimensional conditional density estimation. Electronic Journal of Statistics, 11(2):2800–2831, 2017.

24
Art B Owen.
A robust hybrid of lasso and ridge regression.
Contemporary Mathematics, 443(7):59–72, 2007.  25 David AlvarezMelis and Tommi S Jaakkola. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049, 2018.
 26 Larry Wasserman. Topological data analysis. Annual Review of Statistics and Its Application, 5:501–532, 2018.
 27 Herbert Edelsbrunner, David Kirkpatrick, and Raimund Seidel. On the shape of a set of points in the plane. IEEE Transactions on information theory, 29(4):551–559, 1983.
 28 Herbert Edelsbrunner. Alpha shapes—a survey. Tessellations in the Sciences, 27:1–25, 2010.
 29 Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
 30 Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.