As information retrieval (IR) systems become more and more prevalent, it becomes increasingly important to understand how an IR system produces a particular prediction and what exactly drives it to do so. Understanding how “black-box” models arrive at their predictions has sparked significant interest from both within and outside the IR community. This can be in the context of rankings (ter Hoeve et al., 2017), recommendations (Tintarev, 2007), or digital assistants that engage in interactive question answering (Qu et al., 2019).
Explanations of an IR system can be provided for the system as a whole or for individual decisions produced by the system. Explanations based on interpreting the model in all regions of the input space are called global explanations, while those based on interpreting individual predictions are called local explanations (Guidotti et al., 2018). The explainability problem is often cast in terms of supervised prediction models: IR systems usually involve a prediction at some point in the pipeline (i.e., predicting whether a document is relevant or not).
Given how often we use complex models to help us make difficult decisions, it is important to be able to understand what happens during the training phase of the model. We propose doing this by generating local explanations about individual predictions. Recent work on local explanations is usually conducted in either a model-agnostic or model-specific way (Guidotti et al., 2018). Model-agnostic explanations typically involve approximating the original “black-box” model locally in the neighborhood of the instance in question (Ribeiro et al., 2016b), while model-specific explanations use the inner workings of the original “black-box” to explain the prediction of the given instance (Tolomei et al., 2017). The obvious advantage of model-agnostic explanations is that they can be applied to any type of model (Ribeiro et al., 2016a), but since the explanation is based on a local approximation of the original model, there exists some inherent degree of error between the original model and the local approximation. Indeed, since the local model is an approximation, there is no guarantee that it is appropriately representative of the original model, especially in other parts of the input space (Alvarez-Melis and Jaakkola, 2018). In our work, we focus on generating model-specific explanations for boosting ensembles since they are widely used in industry and have demonstrated superior performance in a wide range of tasks.
This gives rise to our leading research question:
How can we automatically generate actionable explanations for individual predictions of tree-based boosting ensembles?
In order to address our leading research question, we propose a three-part research agenda for using explanations to understand individual predictions.
Generate explanations for individual “black-box” predictions in terms of (i)
why a particular prediction was classified as a certain class,(ii) what it would have taken for the prediction to be classified as the alternative class, and (iii) how to perturb the model in order to change the prediction.
Develop a mechanism that allows the user to change the prediction based on the explanation.
Evaluate the effectiveness of such explanations on users’ confidence in and trust of the original “black-box” model. This also involves determining appropriate baselines and metrics, and a sensible experimental environment in terms of the people involved and the questions asked.
In this work, we outline ideas along with a case study about items (1) and (2) above. The work of Tolomei et al. (2017) has the potential to solve this problem but we argue that it (i) does not apply to boosting ensemble methods, and (ii) has scalability issues. In order to come up with a satisfactory solution to our problem, we take the method from (Tolomei et al., 2017), explain it and articulate how it could be extended to accommodate tree-based boosting ensembles. In this extended abstract, we focus on adaptive boosting (Hastie et al., 2009) first to disentangle the sequential training nature of boosting methods.
3. A Case Study in Explaining Individual Predictions – Work in Progress
We focus on explaining predictions from tree-based boosting ensemble methods (or simply boosting methods). Boosting methods are based on sequentially training (weak) models that, in each iteration, focus more on correcting the mistakes of the previous model. We train a boosting ensemble using an input set to predict a target variable , where are the set of base classifiers for the ensemble and are the corresponding predictions for each base classifier.
In adaptive boosting (Hastie et al., 2009), each iteration improves over by upweighting misclassified instances (and downweighting correctly classified instances) by a factor of , where is the weight assigned to in the ensemble and is defined as
and is the classification error of the -th base classifier .
3.1. Problem definition
Tolomei et al. (2017) investigate the interpretability of random forests (RFs) by determining what drives a model to produce a certain output for a given instance in a binary classification task. They frame the problem in terms of actionable recommendations for transforming negatively labeled instances into positively labeled ones in a binary classification task. Our objective is to extend this method to work for boosting methods and later use these explanations to transform misclassified instances into correctly classified ones. This involves accounting for some components of boosting that do not apply to RFs: (i) the sequential dependency between trees, and (ii) training on the negative gradients instead of the original labels (in the case of gradient boosting decision trees (GBDTs)). We break the task up into two stages:
We extend our new method for adaptive boosting to gradient boosting (Friedman et al., 2000), where we not only train in sequence but also train on the negative gradients of the previous tree.
This leads to the following research questions:
RQ1: Given an instance, how can we perturb the instance such that the prediction for this instance flips from one class to another?
RQ2: Given an instance, how can we perturb the model such that the prediction for this instance flips from one class to another?
3.2. Related work
The method in Tolomei et al. (2017) is defined as follows: let be an observation in the set such that is a true negative instance (i.e., , where is the overall prediction of the ensemble and is the true label). The objective is to create a new instance, , that is an -transformation from an existing positively predicted instance, .
The trees in the ensemble (an RF in this case) can be partitioned into two sets depending on whether the prediction resulting from each tree is either positive or negative (the base classifier corresponding to tree is either or ). We are interested in the set of trees that result in negative predictions since we want to determine the criteria for turning these into positive predictions.
Therefore, for every positive path (i.e., paths that result in a positive prediction, indexed by ) in every negative tree (i.e., ), we want to generate an instance that satisfies this positive path (i.e., ), based on our original instance .
We create by examining the feature values of and the corresponding splitting thresholds in . For each feature in , if satisfies the splitting threshold for , then we leave the value for this feature alone. If not, then we tweak the value for this feature such that it is -away from the splitting threshold and satisfies .
We construct an based on from every positive path in every negative tree and evaluate the output of the entire ensemble using this . If then is a candidate transformation of .
We greedily choose the candidate transformation that is closest to the original instance and this is returned as the minimal perturbation of the original instance such that the prediction flips from negative to positive. We call this .
Since this -perturbation allows us to discriminate between the two classes so it can be viewed as the contrastive explanation (Miller, 2019) for why as opposed to .
This work relies heavily on being able to enumerate the positive paths in each negative tree , which is not possible when training on the negative gradients instead of the original labels. This is also very computationally intensive since we compute an -transformation for an in each . In our work, we want to use the sequential training nature of boosting methods to narrow the search space as early as possible.
3.3. Method outline
Given an instance , we are interested in reducing the search space for in order to make the method by Tolomei et al. (2017) more scalable. To this end, we look for a subset of the original ensemble, , such that the rest of the ensemble can safely be ignored. That is, we want to select the most important trees in the overall model without omitting trees that were particularly influential for this prediction.
We pursue two directions to determine whether such a might be found. The first idea is to consider how much each tree contributes to the prediction by examining their corresponding weights . We want to determine whether or not they decrease for each iteration in the training of the ensemble, and if so, how quickly this happens. The hypothesis is that if the weights drop quickly and to small quantities, then we can narrow the search space by only examining the trees in the beginning of the ensemble. We choose two binary classification datasets: Adult (adu, 1996) and home equity line of credit (HELOC) (FICO, [n. d.]) and train an adaptive boosting model with 100 iterations, each with maximum depth 4 on the two datasets. Figure 1 shows the weights for each iteration in the model. Indeed, we see that the trees at the beginning of the ensemble seem to be more important to the overall prediction, as they have higher weights, than the trees towards the end. Therefore, if we want to reduce the search space, a sensible starting point would be to identify based on the distribution of and examine only the first trees in the ensemble. The potential error resulting from only considering the first trees is sufficiently small given that the weights of the remaining trees are small, and therefore their impact on the overall prediction is minimal in comparison to the first trees. In addition to giving us a way to reduce the search space, this can also provide some insight into how difficult it was for the model to classify this instance – the larger the , the more difficult it was.
Another option for determining the subset of trees that would allow us to reduce the search space is by looking for structure in how the sample weights change as an instance goes through each iteration of the model and identifying trees of interest based on this distribution. If the prediction of iteration , , is correct, then ; the opposite is true if is incorrect. Figure 2 shows the evolution of these sample weights for two random instances in each of the datasets, Adult and HELOC: one that is correctly classified (depicted in green) and one that is incorrectly classified (depicted in red). We see that in both datasets, the weights for the correct instances decrease substantially within the first 15 iterations, implying the model is continuously classifying the instances correctly. In contrast, the weights for the incorrect instances increase substantially within this same period, implying the model is continuously misclassifying these instances. When the weights flatten out (e.g., for the correct instance in the Adult dataset between and ), this implies is oscillating between and , or analogously, oscillating between being correct and incorrect. The structure in the weight evolution of a particular instance gives us some insight into how the model learns to classify this point and how the the prediction fluctuates with each iteration. This can help us determine which trees should be included in the subset of the original ensemble we want to examine further and we outline some further ideas for this in Section 3.4.
3.4. Next steps
We have provided some initial ideas for generating explanations for tree-based boosting predictions. We plan to investigate learning for a given instance , perhaps based on the distribution of training sample weights along with the weights of each iteration . We also plan to investigate how this method could be extended to account for training on the negative gradients as is done in GBDTs.
We have sketched a research agenda for explaining predictions from boosting methods and sketched a case study to illustrate how to generate these explanations.
In our case study, we examined how we can use the sequential training nature of boosting methods to narrow the search space for alternative examples when generating explanations. We will also explore how training on the negative gradient can be used to generate explanations for GBDT predictions and will evaluate the impact these types of explanations have on users who interact with the system. Finally, we invite the community to join the discussion on how we can automatically and transparently fix algorithmic errors, in ways that are meaningful for IR system experts as well as those outside the community.
Acknowledgements.This research was partially supported by Ahold Delhaize, the Association of Universities in the Netherlands (VSNU), the Innovation Center for Artificial Intelligence (ICAI), and the Netherlands Organisation for Scientific Research (NWO) under project nr 652.001.003. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.
UCI Machine Learning Repository.(1996). https://archive.ics.uci.edu/ml/datasets/Adult
- Alvarez-Melis and Jaakkola (2018) David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the Robustness of Interpretability Methods. arXiv:1806.08049 [cs, stat] (June 2018). arXiv: 1806.08049.
- FICO ([n. d.]) FICO. [n. d.]. Explainable Machine Learning Challenge. ([n. d.]). https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2
et al. (2000)
Jerome Friedman, Trevor
Hastie, and Robert Tibshirani.
Additive Logistic Regression: A Statistical View of Boosting.(2000), 71.
- Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018).
- Hastie et al. (2009) Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. 2009. Multi-class AdaBoost. Statistics and Its Interface 2, 3 (2009), 349–360.
- Miller (2019) Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artifical Intelligence 267 (February 2019), 1–38.
- Qu et al. (2019) Chen Qu, Liu Yang, Bruce Croft, Falk Scholer, and Yongfeng Zhang. 2019. Answer Interaction in Non-factoid Question Answering Systems. Proceedings of the 2019 Conference on Human Information Interaction and Retrieval - CHIIR ’19 (2019), 249–253. https://doi.org/10.1145/3295750.3298946 arXiv: 1901.03491.
- Ribeiro et al. (2016a) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016a. Model-Agnostic Interpretability of Machine Learning. ICML Workshop on Human Interpretability in Machine Learning (2016).
- Ribeiro et al. (2016b) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016b. Why should I trust you?: Explaining the predictions of any classifier. In KDD. ACM, 1135–1144.
- ter Hoeve et al. (2017) Maartje ter Hoeve, Mathieu Heruer, Daan Odijk, Anne Schuth, Martijn Spitters, and Maarten de Rijke. 2017. Do news consumers want explanations for personalized news rankings?. In FATREC Workshop on Responsible Recommendation.
- Tintarev (2007) Nava Tintarev. 2007. Explaining Recommendations. In User Modeling 2007, Cristina Conati, Kathleen McCoy, and Georgios Paliouras (Eds.). Vol. 4511. Springer Berlin Heidelberg, Berlin, Heidelberg, 470–474.
- Tolomei et al. (2017) Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17 (2017), 465–474.