Deep learning models are being used in critical applications which demand not only human oversight but for their predictions to be explained as well, if they are to be considered trustworthy
. Explainability tools have been developed aiming to make black-box classifiers interpretable ([3, 26]). Surrogate explainers, such as Local Interpretable Model-agnostic Explanations (LIME) , provide an explanation by fitting an interpretable surrogate model to explain the prediction of an instance.
However, explanations produced by LIME can vary due to the hyperparameters of the procedure. Several papers have looked into the shortcomings of LIME and proposed more robust versions ([27, 13]). In particular, the main sources of uncertainty affecting LIME explanations are studied in . Their work analyses the uncertainty due to LIME’s hyperparameters, and also the stochasticity in the process of generating the explanation. Similarly,  presents both a theoretical and an empirical analysis of the variability of explanations produced for a single image. These results suggest that the inherent stochasticity of LIME induces diversity among multiple explanations produced for the same instance. The idea of applying LIME multiple times to an instance is proposed in , while the robustness of LIME, with regards to changes in the input data, is explored in .
In this paper, we take a step back and aim to enrich the explanations by incorporating an estimate of their uncertainty. This allows for a more meaningful interaction, potentially enabling the user to either trust or reject the explanation Our contributions are as follows:
We provide uncertainty estimates for explanations using bootstrapping and ordinal consensus metrics. We showcase these using tailored visualisations that convey this information for the practitioner.
Beyond the uncertainty within LIME and the uncertainty induced by the input data, we also consider the predictive uncertainty. We do this by considering the model of interest to be an ensemble of black-box models, rather than a single black-box.
We highlight the number of surrogates and the number of instances the surrogates are fitted to as key parameters that help the user fine-tune and adjust our proposed procedure depending on the use case
2 Related Work
The process of deriving surrogate explainers is complex and driven by several interconnected factors and objectives ([22, 21]). In general, this type of explainers can be unstable and lead to varying surrogate coefficients and, in consequence, diverse explanations ([1, 30, 31]). The variability within surrogate coefficients can be seen as uncertainty that surrogate explanations are entailed with. While  and  highlight the sampling space where the surrogate is fitted as a source of uncertainty,  motivates the need of also considering the predictive uncertainty of the black-box to be explained.
In this work we address the quantification of the surrogate explanation uncertainty by aggregating multiple surrogate coefficients. The use of a consensus mechanism to obtain explanations that are less sensitive to sampling variance (further discussed in Section3) has been proposed in [4, 24]. Specifically,  and  consider aggregating surrogate coefficients in the form of simple ranking schemes inspired from the social sciences and economics.
BayesLIME was proposed in 
to generate surrogate explanations with a measure of uncertainty. The uncertainty is quantified by evaluating the probability that surrogate coefficients lie within their 95% credible intervals. The work suggests sampling perturbations that yield most information to the models behaviour, thus reducing the computational complexity. The practitioner is informed about the uncertainty of feature attribution to each explainable component.
The idea of using uncertainty-aware black-box models for interpretability has been investigated in  and  and demonstrated on various saliency mapping methods. However, little work has been done on combining surrogate explanations with model uncertainty . In this paper, we introduce a framework that uses the aggregation of multiple diverse surrogate explainers in combination with uncertainty aware deep learning ensembles. Similarly to , we motivate the number of perturbations sampled as well as the number of surrogates derived as key parameters the practitioner needs to fine tune in order to derive explanations that satisfy the desired certainty of the surrogate derivation process.
In this section, we briefly introduce local-surrogate explanations, with a particular focus on LIME (). We highlight different sources of surrogate uncertainty and discuss two in particular which, as further described in Section 4, are used to naturally induce diversity among multiple bootstrapped surrogates.
3.1 Surrogate Explanations
Local-surrogate explanations belong to a category of post-hoc model-agnostic explanation approaches first introduced in . One such approach is LIME, which is an instantiation of the following formulation:
The surrogate explainer is from an interpretable model class . The locality around the data point for which the prediction is to be explained, is controlled by the similarity kernel . The loss characterises how close is to . The penalty term represents a complexity measure of . In practice is the class of linear models: . Model fitting is performed on a set of points
drawn from a Gaussian distribution centred on, and then the weights
are computed using the radial basis function kernel.
3.2 Uncertainty in Surrogate Explanations
Previous works have identified the following sources of uncertainty in surrogate explanations:
Implementation of explanation procedure (highlighted in ).
Choice of surrogate structure (introduced in ).
In this paper, assuming sources 2 and 3 fixed, we focus on the sampling variance as well as the predictive uncertainty of the model to be explained. We argue that the predictive uncertainty of the model can be seen as an extra source of variability, adding to the uncertainty of the explanations. To show this, in the examples below we use ensemble models for their nice properties on uncertainty estimation. Details on the architecture of the ensemble are given in Sec. 5.
Variability of surrogate coefficients due to sampling variance
Following the works of  and , Fig. 0(a) shows the variability of surrogate coefficients due to sampling variance in an image classification task. In images, the explanations are usually based on superpixels given by a segmentation of the image with semantic meaning. Here, LIME is run 100 times (by first drawing 100 distinct sets of points, ) resulting in 100 surrogates with the default configuration. We generate the predictions by averaging the predictions of the individual ensemble members. Since the sets of image perturbations are generated randomly, values of are not deterministic.
We see that the mean value of is the highest, suggesting that the superpixel can be more clearly identified as, on average, the most relevant region of the image for the classification purposes. can be identified as the least important. For the rest, the ordering is not clear. This is important if the user is interested in tuning LIME such that the distributions of do not overlap so that the order of importance of the coefficients can be clearly identified.
Variability of explanations due to predictive uncertainty
The uncertainty of LIME due to the predictive uncertainty of the black-box classifier has not been addressed in previous works. However, this is something that we can study when using ensemble models. In Fig. 0(b) LIME is run 100 times on a fixed set of image perturbations (Compared to above, in this experiment we only have one set of points , as opposed to above where we use 100 ). Here, a single prediction is obtained from a randomly chosen member of the ensemble. Therefore, differently from the experiment presented in Fig. 0(a), the variability of the surrogate coefficients is now solely induced by sampling the predictions for image perturbations randomly from the individual models contained in the ensemble. Again, for this particular image, the top and bottom coefficient remain the same as before, corroborating the message from the previous example. In Sec. 5 we will further explore this relationship empirically. In Sec. 4, we present a method of deriving multiple diverse surrogates and aggregate their coefficient values through a rating scheme, allowing to estimate the uncertainty of the aggregated explanation through measures of consensus.
4 Uncertainty Quantification via Ordinal Consensus
The coefficients of the surrogate are representative of the behaviour of locally. A method for estimating the distribution of surrogate coefficients is bootstrap (), as the sampling variance of data points around naturally induces diversity among bootstrapped surrogates. Additionally, we propose the use of ensemble techniques to account for the stochasticity of the prediction behaviour of , reinforcing the diversity of the bootstrapped surrogates. We propose ordinal metrics to aggregate the surrogate coefficients and quantify uncertainty.
4.1 Bootstrapping LIME
In our approach, that we refer to as Bootstrapping LIME (BLIME), multiple surrogate models are fitted by bootstrapping the perturbation dataset . Since an ensemble model can be treated as a probabilistic classifier, the output for a perturbation
can also be sampled from the ensemble classifier by sampling a base model from the set of models. The BLIME algorithm is as follows. From every surrogate model, we obtain a coefficient vector. Then, for every coefficient vector, we obtain a ranking , ordering coefficients from smallest to largest in value. In this manner, and continuing with the image classification example, if the procedure is repeated times, for a total of superpixels, we can compactly represent these ranking vectors as rows of a ranking matrix . is then interpreted as a rating scheme, where superpixels are being rated by surrogates.
4.2 Ordinal Metrics
The reduction of surrogate coefficients to a ranking can be regarded as a normalisation step that makes multiple surrogates comparable. Through ordinal statistics, we can quantify the level of consensus amongst the surrogates to gain further insights for a given explanation. We report the following metrics as proxys to the underlying uncertainty.
By comparing the mean rank of a superpixel to those of all the other superpixels, indicating the relative importance of each of them.
The ordinal consensus of the ranking of a superpixel , as defined in , can be used to evaluate whether there is high agreement among the raters, indicated by being close to 1. A value of
closer to 0.5 suggests a rather uniform distribution of rankings assigned to superpixels, which means that the importance assigned to varies widely among the surrogates. For closer to zero, ordinal dispersion indicates high polarisation among raters when assigning a rank to .
Inter-rater reliability measures
Here we consider reliability measures to address the overall uncertainty among all surrogates regarding all interpretable components, namely Fleiss’ Kappa  and Kendall’s coefficient of concordance . While measures the agreement among raters specifically for rankings, estimates the agreement regardless of the similarity of the assigned ranks.
As described in Section 4, we propose using the sampling variance of and ensemble classifier to induce diversity into bootstrapped surrogates. We identify the number of surrogates aggregated and the size of sets as key parameters the user can fine-tune for the explanation derivation process. For the experiments we consider image classification and prediction from text data.
For the image classification task we use the CIFAR-10 dataset
For this work, the data set is split to a training set and a validation set with 50000 and 10000 images respectively. For sentiment classification we use the movie review dataset IMDB. The task is to classify a movie review as positive or negative based on the given text. The dataset consists of 50000 labelled reviews.
For our black-box image classifier we use an ensemble of 5 CNNs with ResNet architecture as in . The ensemble is created by training all CNNs individually, using random weight initialisation  and data shuffling during training to induce diversity . For the text analysis we use an ensemble of fully connected neural networks in combination with GloVe embeddings 
, using random weight initialising and data shuffling during training to induce diversity among the ensemble members. The default configuration of LIME is used with linear regression as surrogates.
In Fig. 1(a) we see that by increasing the number of perturbations for each bootstrap sample, the mean ranking of the superpixels converges towards values on the full ranking interval 1 to 8, whereas for small numbers of perturbations, the mean ranks are squashed into a rather small interval (top plot). The bottom plot shows that the level of agreement of the ranking of superpixels increases as the number of image perturbations is also increased. This is expected as the surrogates are trained on datasets that are more similar between them. Therefore, the explanation derived by aggregating multiple surrogates on more perturbations can be considered more certain with regards to the individual superpixels ranking. Examining both plots depicted in fig. 1(a), the highest agreement for the highest and lowest ranked superpixels ( and ) among the surrogates is maximised. In Figure 1(b), the same experiment is run for different numbers of bootstrap samples for a fixed number of perturbations. We see that increasing the number of surrogates does not increase the agreement of the raters assigning ranks to the superpixels (bottom plot). Contrary to Figure 1(a), the ranks do not converge to their absolute ranking. The agreement measured by the consensus estimated , however, changes showing an increasing or decreasing trend.
In Figs. 2(a) and 2(b) we examine effects of the number of perturbations drawn and the number of surrogates derived on the uncertainty estimates and . As shown in Fig. 2(a), increasing the number of perturbations drawn shifts the distributions of consensus estimates towards higher values which matches the findings from Fig. 1(a). Fig. 2(b), however, indicates that the distribution of derived uncertainty estimates become more narrow, resulting in more reliable estimates. Figures 4 and 5 show examples where our method provides additional information to the practitioner about the explanation. In fig. 4 our method allows the user to compare different image segmentations for training surrogates. Absolute ranking of the mean ranks of the superpixels is shown in the second column. The mean ranks of superpixels are depicted in the third column. The image in the rightmost column shows the level of agreement amongst the surrogates regarding the ranking of the individual superpixels measured using the ordinal consensus . The original image is segmented into 8 superpixels using different segmentation algorithms. By inspecting the uncertainty estimates and , the practitioner can conclude that the bottom segmentation results in an overall more certain explanation, as and are higher compared to the segmentation in the top row. The example depicted in Fig. 5 shows our method on a text dataset (IMDB). Here, we highlight how the higher variance of surrogate coefficients is also shown by the ordinal consensus .
In this paper we make the case for the importance of reporting an uncertainty estimated of an explanation together with the explanation, when explaining a prediction. This provides the user with the option of rejecting an explanation for being too uncertain. To this end, we proposed a procedure where we first bootstrap LIME, and then aggregate the outputs using ordinal statistics, obtaining both an explanation and a measure of its uncertainty.
-  (2018) On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049. Cited by: §1, §2, item 1.
-  (2020) Getting a clue: a method for explaining uncertainty estimates. arXiv preprint arXiv:2006.06848. Cited by: §2, §2.
-  (2019) One explanation does not fit all: a toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012. Cited by: §1.
-  (2019) Towards aggregating weighted feature attributions. arXiv preprint arXiv:1901.10040. Cited by: §2.
Building human-machine trust via interpretability.
Proceedings of the AAAI conference on artificial intelligence, Vol. 33, pp. 9919–9920. External Links: Cited by: §2.
-  (2020) Evaluating and aggregating feature-based model explanations. arXiv preprint arXiv:2005.00631. Cited by: §2.
-  (1992) Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics, pp. 569–593. External Links: Cited by: §4.
-  (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and psychological measurement 33 (3), pp. 613–619. External Links: Cited by: §4.2.
-  (2020-08) Explaining the explainer: a first theoretical analysis of lime. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, Vol. 108, pp. 1287–1296. External Links: Cited by: §1, §2, item 1, §3.2.
-  (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. External Links: Cited by: §5.
-  (2019) IBreakDown: uncertainty of model explanations for nonadditive predictive models. arXiv preprint arXiv:1903.11420. Cited by: §1, §2, item 1, §3.2.
-  (2016) Deep residual learning for image recognition. In , pp. 770–778. External Links: Cited by: §5.
-  (2021) Explainers in the wild: making surrogate explainers robust to distortions through perception. In 2021 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 3717–3721. External Links: Cited by: §1.
-  (1939) The problem of m rankings. The annals of mathematical statistics 10 (3), pp. 275–287. External Links: Cited by: §4.2.
-  (2009) Learning multiple layers of features from tiny images. Technical report Technical report, University of Toronto. External Links: Cited by: §5.
-  (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, pp. 6402–6413. External Links: Cited by: §5.
-  (2019) Developing the sensitivity of lime for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. External Links: Cited by: item 1.
-  (1966) A measure of ordinal consensus. Pacific Sociological Review 9 (2), pp. 85–90. External Links: Cited by: §4.2.
Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 142–150. External Links: Cited by: §5.
Glove: global vectors for word representation.
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. External Links: Cited by: §5.
-  (2021) On the overlooked issue of defining explanation objectives for local-surrogate explainers. arXiv preprint arXiv:2106.05810. External Links: Cited by: §2.
-  (2021) Understanding surrogate explanations: the interplay between complexity, fidelity and coverage. arXiv preprint arXiv:2107.04309. External Links: Cited by: §2.
-  (2016) ” Why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. External Links: Cited by: §1, §3.1, §3, §5.
-  (2019) Aggregating explanation methods for stable and robust explainability. arXiv preprint arXiv:1903.00519. Cited by: §2.
-  (2020) Reliable post hoc explanations: modeling uncertainty in explainability. arXiv preprint arXiv:2008.05030. Cited by: §2, §2.
FAT forensics: a python toolbox for implementing and deploying fairness, accountability and transparency algorithms in predictive systems.
Journal of Open Source Software5 (49), pp. 1904. External Links: Cited by: §1.
-  (2019) BLIMEy: surrogate prediction explanations beyond lime. arXiv preprint arXiv:1910.13016. Cited by: §1.
-  (2020) Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps. Medical image analysis 60. External Links: Cited by: §2.
-  (2020) Uncertainty-aware deep ensembles for reliable and explainable predictions of clinical time series. IEEE Journal of Biomedical and Health Informatics. External Links: Cited by: §2.
-  (2019) ” Why should you trust my explanation?” understanding uncertainty in lime explanations. arXiv preprint arXiv:1904.12991. Cited by: §1, §2, item 2, item 3.
-  (2021) S-lime: stabilized-lime for model explanation. arXiv preprint arXiv:2106.07875. Cited by: §2.