Explaining predictions of black box models is of uttermost importance in the domain of credit risk assessment Bruckner (2018). The problem is even more prominent given the recent right to explanation introduced by the European General Data Protection Regulation Goodman and Flaxman (2016), and a must due to regulation in the financial domain. A common approach to explain black box predictions focuses on generating local approximations of decisions. If
is a machine learning model taking the featuresand mapping them to targets , then the goal is to find a subdomain of the feature variables and over that domain approximate , where is an interpretable and easy to understand function. There has been recent interest in model-agnostic methods of explainability. These methods look to create an explainer that should be able to explain any model treating the underlying model as a black box. Ribeiro et al. (2016a).
This paper focuses on Counterfactual Explanations Wachter et al. (2017), one of these model-agnostic methods. A counterfactual explanation may justify a rejected loan application as follows:
Your application was denied because your annual income is $30,000 and your current balance is $200. If your income had instead been $35,000 and your current balance had been $400 and all other values remained constant, your application would have been approved.
The explanation describes the required minimum change in inputs to flip the decision of the black box classifier. Note that the latter remains a black box: it is only through changing inputs and outputs that an explanation is obtained.
Despite their effectiveness, two problems arise: on one hand counterfactuals are inherently designed to describe what it takes to flip the decision of a classifier, hence they poorly address the case in which the decision was satisfactory from an end user perspective (e.g. loan approved). Problems also arise when assessing the interpretability of counterfactuals: the generated counterfactuals often suggest to change a high number of features, therefore leading to less intelligible explanations. For example, counterfactuals generation strategies do not take into account the importance of the dataset features, thus underestimating or overestimating certain dimensions. This problem is of particular importance given that it has been showed that human short-term memory is unable to retain a large number of information units Vogel et al. (2001); Alvarez and Cavanagh (2004). This remains as problematic in the finance domains of loan approval when data scientis and regulators are in the loop.
We predict loan applications with off-the-shelf, interchangeable black-box estimators, and we explain their predictions with counterfactual explanations. To overcome the aforementioned problems, we present the following contribution:
Positive Counterfactuals: in case of a desired outcome, we interpret counterfactuals as a safety margin, i.e. a tolerance from the decision boundary. Such counterfactual explanations for positive predictions answer the question "How much was I accepted by?".
Weighted Counterfactuals: inspired by Huysmans et al. Huysmans et al. (2011), we use the size of explanations as a proxy to measure their interpretability. To obtain more compact (and hence more intelligible) counterfactuals we introduce weights in the their generation strategy. We propose two weighting strategies: one based on global feature importance, the other based on nearest neighbours.
We experiment on a credit application dataset and show that our weighted counterfactuals generation strategies lead to smaller counterfactuals (i.e. counterfactuals that suggest to change a smaller number of features), thus delivering more interpretable explanations.
2 Related Work
Local Interpretability A number of works focus on explaining single predictions of machine learning models, rather than the model as a whole. This task is also known as local interpretability
. White box models come with local explanations by design: traditional transparent design approaches include decision trees and rule extractionGuidotti et al. (2018); Molnar (2018). In that respect some works such as Craven and Shavlik (1995)
built surrogate models by interfacing complex models such as deep neural networks with more interpertable models such as decision trees. The authors aim at mimicing the behaviour of a complex model with a much simpler model for interpretability purpose. Other approaches are instead model-agnostic, and also address explanations of predictions of black box models. LIME generates local explanations from randomly generated neighbours of a record. Features are weighted according to distances from the record.Ribeiro et al. (2016b). SHAP is another approach based on feature importance for each record Lundberg and Lee (2017). Other recent lines of research rely on example-based explanations: Prototype Bien and Tibshirani (2011) and Criticism Kim et al. (2016) Selection are two recent examples of this. Prototypes are tuples representative of the dataset, whereas criticisms are examples which are not well-explained by prototypes. Adversarial examples Kurakin et al. (2016) are another example-based approach, but they are designed to flip the decision of a black-box predictor rather than explaining it. Counterfactual explanations are also an example-based strategy, but unlike adversarial examples, they inform on how a record features must change to radically influence the outcome of the prediction.
Interpretable Credit Risk Prediction Providing a comprehensive review of more than 20 years of research in credit risk prediction models is out of the scope of this paper. The survey by Lyn et al. Thomas et al. (2017) gives a comprehensive and up-to-date overview). Huang et al. Huang et al. (2004)
briefly mention an explanation of predictive models for credit rating, but their survey limits to ranking features by importance with variance analysis. A more recent survey by Louzada et al. focuses on predictive powerLouzada et al. (2016) only. Instead, we list works that consider interpretability as a first-class citizen. A number of works rely on white box machine learning pipeline, mostly using decision trees and rules inference: Khandani et al. Khandani et al. (2010) propose a pipeline to predict consumer credit risk with manual feature engineering and decision trees. The latter being an explainable model, we can consider this work as a rather interpretable approach. Florez-Lopez et al. adopt an ensemble of decision trees and explain predictions with rules López and Ramon-Jeronimo (2015). Predictive power is encouraging, but there is no comparison against neural architectures. Martens at al. combine rule extraction with SVMs Martens et al. (2007). Obermann et al. compare decision tree performance to grey and black boxes approaches on an insolvency prediction scenario Obermann and Waack (2015, 2016)
. Other white-box approaches include Markov models for discriminationVolkov et al. (2017) and rule inference Xu et al. (2017). Black box approaches show the most promising predictive power, to the detriment of interpretability. Danenas et al adopt SVM classifiers Danenas and Garsva (2015)
. Addo et al. leverage a number of black-box models, including gradient boosting and deep neural architecturesAddo et al. (2018). Although they evaluate the predictive power of their models they do not attempt to explain their predictions, either locally or globally. To the best of our knowledge, no work in literature focuses on local interpretability for black box models applied to credit risk prediction.
3 Preliminaries: Counterfactual Explanations
A counterfactual explanation describes a generic causal situation in the form:
as a black box that takes the feature vector and generates the outcome , and we determine what is the closest to that would change the outcome of the model from to the desired (Figure 1). When generating counterfactuals it is assumed that the model , the feature vector and the desired output are provided. The challenge is finding , i.e. an hypothetical input vector which falls close to but also for which falls sufficiently close to .
Score was returned because variables had values associated with them. If instead had values , and all other variables had remained constant, score would have been returned.
Generating Counterfactuals. We generate counterfactual explanations by calculating the smallest possible change () that can be made to the input , such that the outcome flips from y to
. We generate counterfactuals by optimizing the following loss function, as proposed by Wachter et al. Wachter et al. (2017):
where is the actual input vector, is counterfactual vector, is the desired output state, is the trained model, is the balance weight. balances the counterfactual between obtaining the exact desired output and making the smallest possible changes to the input vector . Larger values for favor counterfactuals which result in a that comes close to the desired output , while smaller values lead to counterfactuals that are very similar to . The distance metric measures , i.e. the amount of change between and . We use the Manhattan distance weighted feature-wise with the inverse median absolute deviation (MAD) 3
. Such metric is robust to outliers, and introduces sparse solutions where most entries are zeroWachter et al. (2017). Indeed, the ideal counterfactual is one in which only a small number of features change and the majority of them remain constant. The distance metric can be written as:
To generate counterfactuals we adopt the iterative approach described in Algorithm 1. We optimize with the Nelder-Mead algorithm, as suggested in Molnar (2018). We constrain the optimisation with a tolerance s.t . The value for depends on the problem space and is determined by the range and scale of . Step 3 iterates over until the constraint is satisfied. A check is performed for a value greater than as increasing will place more weight on obtaining an closer to the given desired output . Once an acceptable value for is obtained for the given and a set of counterfactuals can be obtained by repeating steps 1 and 2 with the calculated
. Note that we constrain the features manually, since the heuristic in Algorithm1 and the adopted optimization algorithm are designed for unconstrained optimization.
In this section we describe our two main contributions made towards the explainability of black box machine learning pipelines that predict credit decisions. Using positive counterfactuals we explain why a loan was accepted and provide details that help inform an individual when making future financial decisions. Next we present weighted counterfactuals that aim at personalizing the counterfactual recommendations that are provided to individuals that received an undesirable outcome (i.e. their loan was denied).
4.1 Counterfactuals for positive predictions
In order to explain why applications were accepted, we applied counterfactuals to the scenario where the individual received the desired outcome, i.e positive counterfactuals. Here instead of answering the question "Why wasn’t I accepted?" we focus on the question of "How much was I accepted by?". Such approach informs the individual about the features and value ranges that were important for their specific application, thus favouring more informed decisions about potential future financial activities. For example, if the individual is considering an action that may temporarily increase their number of delinquencies then, armed with positive counterfactuals, they will have a better understanding of the impact on future loan applications.
In the binary classification case we achieve positive counterfactuals by setting the target to be the decision boundary i.e . This allows us to identify the locally important features that would push the individual to the threshold of being accepted. Another way of viewing this is that these are the features that locally contribute to the desired outcome.
We present this information to the individual and display it as tolerance. In Figure 4 this is illustrated by a dashed line. Given that future actions do not reduce the indicated features below the dashed line, and all other features remain constant, then the individuals application should remain likely to be approved.
4.2 Weighted Counterfactuals
The general implementation of counterfactuals described in Section 3 assumes all features are equally important and changing each feature is equally viable. This, however, is not necessarily the case. For each feature its ability to change and the magnitude of the change may vary on a case by case bases. In order to capture this information and create more interpretable actionable recommendations, the generated counterfactuals need to take this into consideration. For example some individuals may be able to increase their savings, while others instead may find it easier to reduce their current expenses. There are also cases where some features may be fixed or immutable. Features like the number of delinquencies in the last six months is historical and fixed. Recommending to change these types of features would be of little use. Our intuition is that promoting highly discriminative features during the generation of counterfactuals leads to more compact, hence better interpretable explanations Huysmans et al. (2011).
We address these issues by introducing a weight vector to the distance metric defined in Equation 3. This vector promotes highly discriminative features.
We propose two different strategies to generate these weight vectors. The first relies on the global feature importance, the second relies on a Nearest Neighbors approach. The goal is obtaining counterfactuals that suggest a smaller number of changes or focus on values that are relevant to the individual and have historically been shown to vary.
Global feature importance. We compute global feature importance using analysis of variance (ANOVA F-values) between each feature and the target, and we create a weight vector that promotes highly discriminative features. Our goal is obtaining a smaller set of features in the resulting counterfactual recommendation, thus obtaining more compact explanations.
K-Nearest Neighbors. The second approach uses K-Nearest Neighbors to find cases that are close to the individual but have achieved the desired results. Looking at the nearest neighbors and aggregating over the relative changes we build a weight vector that captures the locally important features for this individual that have historically been shown to change. Here we aim to find counterfactuals containing features that are more actionable by the individual. By using K-Nearest Neighbors approach these weights can be automatically learned when applied to new problem spaces.
We perform a a binary classification task on a credit application dataset. We train a range of black box models and we explain their predictions with counterfactuals, the goal being explaining the classifier decision to reject or accept a loan application. We perform two separate experiments: first, we carry out a preliminary evaluation of the predictive power of our pipeline. This is not the primary focus of this paper, but it is a required step to gauge the quality of predictions. In a second experiment, we assess the size of counterfactuals generated by our weighted counterfactuals generation.
5.1 Experimental Settings
Dataset. We experiment with the HELOC (Home Equity Line of Credit) credit application dataset. Used in the FICO 2018 xML Challenge111https://community.fico.com/s/explainable-machine-learning-challenge, it includes anonymized credit applications made by real homeowners. We drop highly correlated features and filter duplicate records. After pre-processing we obtain 9,870 records (of which 5,000 positives, i.e. accepted credit applications), and 22 distinct features.
Implementation Details. Our machine learning pipeline is written in Python 3.6. This includes preprocessing, training, counterfactuals generation, and performance evaluation. We use scikit-learn 0.20 for the black box classifiers222http://scikit-learn.org/. All experiments were run under Ubuntu 16.04 on an Intel Xeon E5-2620 v4 2.10 GHz workstation with 32 GB of system memory.
As preliminary experiment, we assess the predictive power of four classifiers: logistic regression (LogReg), gradient boosting (GradBoost), support vector machine with linear kernel (SVC), and multi-layer perceptron (MLP). Logistic regression apart, the others fall within the black box category. We perform 3-fold, cross-validated grid search model selection over a number of hyperparameters. We adopt balanced class weights for logistic regression, exponential loss for gradient boosting, for each dataset. SVM uses balanced weights,
. The neural network uses one hidden layer with 22 units. We use the logistic activation function. Where not specified, we rely on scikit-learn defaults. Results in Table1 show the predictive power of the best models. Metrics are 3-fold cross-validated.
. We experiment with 5,000 loan applications in the dataset: we generate a counterfactual explanation for each of them, and compute the average counterfactuals size. Results show that both weighted strategies bring counterfactuals that have a smaller mean and standard deviation. We also observed that in general the average size of the counterfactual recommendations can vary dramatically for the same data given the underlying model.
In general the global feature importance results in features with a lower mean and standard deviation. We obtain explanations which are
smaller on average using the global feature importance strategy against the baseline. This is to be expected, as we promote more discriminative features and as a consequence less ancillary features are required. The benefit in the KNN approach is that the counterfactuals are weighted on the features that are locally important. Here we see that while they may not be the best approach they never perform worse than the baseline. The benefit of the weighting strategies comes with helping the optimization process converge on a local optimum when the underlying space is complex. We look to investigate this claim in future work.
We explain credit application predictions obtained with black box models with counterfactuals. In case of positive prediction, we show how counterfactuals can be interpreted as a safety margin from the decision boundary. We propose two weighted strategies to generate counterfactuals: one derives weights from features importance, the other relies on nearest neighbours. Experiments on the HELOC loan applications dataset show that weights generated from feature importance lead to more compact counterfactuals, therefore offering more compact and intelligible explanations for end users. Future work will focus on validating the effectiveness of our counterfactual explanations against human-grounded and application-grounded evaluation protocols (including the claim that smaller counterfactuals are indeed more interpretable). We will also experiment with weighting strategies that rely on model-specific feature importance, i.e. effect of feature perturbation on entropy of changes in predictions.
- Bruckner  Matthew A Bruckner. Regulating fintech lending. Banking & Financial Services Policy Report, 37, 2018.
- Goodman and Flaxman  Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. stat, 1050:31, 2016.
- Ribeiro et al. [2016a] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. CoRR, abs/1602.04938, 2016a.
- Wachter et al.  Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017.
- Vogel et al.  Edward K Vogel, Geoffrey F Woodman, and Steven J Luck. Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27(1):92, 2001.
- Alvarez and Cavanagh  George A Alvarez and Patrick Cavanagh. The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychological science, 15(2):106–111, 2004.
- Huysmans et al.  Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems, 51(1):141–154, 2011.
- Guidotti et al.  Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. A survey of methods for explaining black box models. CoRR, abs/1802.01933, 2018.
- Molnar  Christoph Molnar. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/, 2018.
- Craven and Shavlik  Mark Craven and Jude W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, USA, November 27-30, 1995, pages 24–30, 1995.
- Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. Model-agnostic interpretability of machine learning. CoRR, abs/1606.05386, 2016b. URL http://arxiv.org/abs/1606.05386.
- Lundberg and Lee  Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017.
- Bien and Tibshirani  Jacob Bien and Robert Tibshirani. Prototype selection for interpretable classification. The Annals of Applied Statistics, pages 2403–2424, 2011.
- Kim et al.  Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems, pages 2280–2288, 2016.
- Kurakin et al.  Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
- Thomas et al.  Lyn Thomas, Jonathan Crook, and David Edelman. Credit scoring and its applications, volume 2. Siam, 2017.
- Huang et al.  Zan Huang, Hsinchun Chen, Chia-Jung Hsu, Wun-Hwa Chen, and Soushan Wu. Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision support systems, 37(4):543–558, 2004.
- Louzada et al.  Francisco Louzada, Anderson Ara, and Guilherme B Fernandes. Classification methods applied to credit scoring: Systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2):117–134, 2016.
- Khandani et al.  Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34:2767–2787, 2010.
- López and Ramon-Jeronimo  Raquel Flórez López and Juan Manuel Ramon-Jeronimo. Enhancing accuracy and interpretability of ensemble strategies in credit risk assessment. A correlated-adjusted decision forest proposal. Expert Systems with Applications, 42(13):5737–5753, 2015. doi: 10.1016/j.eswa.2015.02.042. URL https://doi.org/10.1016/j.eswa.2015.02.042.
- Martens et al.  David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research, 183(3):1466–1476, 2007.
- Obermann and Waack  Lennart Obermann and Stephan Waack. Demonstrating non-inferiority of easy interpretable methods for insolvency prediction. Expert Systems with Applications, 42(23):9117–9128, 2015.
- Obermann and Waack  Lennart Obermann and Stephan Waack. Interpretable multiclass models for corporate credit rating capable of expressing doubt. Frontiers in Applied Mathematics and Statistics, 2:16, 2016.
- Volkov et al.  Andrey Volkov, Dries F Benoit, and Dirk Van den Poel. Incorporating sequential information in bankruptcy prediction with predictors based on markov for discrimination. Decision Support Systems, 98:59–68, 2017.
- Xu et al.  Pu Xu, Zhijun Ding, and MeiQin Pan. An improved credit card users default prediction model based on ripper. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pages 1785–1789. IEEE, 2017.
- Danenas and Garsva  Paulius Danenas and Gintautas Garsva. Selection of support vector machines based classifiers for credit risk domain. Expert Systems with Applications: An International Journal, 42(6):3194–3204, 2015.
Addo et al. 
Peter Martey Addo, Dominique Guegan, and Bertrand Hassani.
Credit risk analysis using machine and deep learning models.Risks, 6(2):38, 2018.