On the Art and Science of Machine Learning Explanations

10/05/2018 ∙ by Patrick Hall, et al. ∙ 0

This text discusses several explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the methods are tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and in-depth software examples.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Interpretability of complex machine learning models is a multifaceted, complex, and still evolving subject. Others have defined key terms and put forward general motivations for better interpretability of machine learning models (and advocated for stronger scientific rigor in certain cases) [9], [11], [13], [18]. This applied text side-steps some looming, unsettled intellectual matters to present viable practical methods for explaining the mechanisms and outputs of predictive models, typically supervised decision tree ensembles, for users who need to explain their work today.

Following Doshi-Velez and Kim, this discussion uses “the ability to explain or to present in understandable terms to a human,” as the definition of interpretable. “When you can no longer keep asking why,” will serve as the working definition for a good explanation of model mechanisms or predictions [11].

Figure 1: An augmented learning problem diagram in which several techniques create explanations for a credit scoring model. Adapted from Learning From Data [1].

As in Figure 1

, the presented explanatory methods help practitioners make random forests, GBMs, and other types of popular supervised machine learning models more interpretable by enabling post-hoc explanations that are suitable for:

  • Facilitating regulatory compliance.

  • Understanding or debugging model mechanisms and predictions.

  • Preventing or debugging accidental or intentional discrimination by models.

  • Preventing or debugging the malicious hacking or adversarial attack of models.

Detailed discussions of the explanatory methods begin below by defining notation. Then Sections 36 discuss explanatory methods and present recommendations for each method. Section 7 presents some general interpretability recommendations for practitioners. Section 8 applies some of the techniques and recommendations to the well-known UCI credit card dataset [17]. Section 9 discusses several additional interpretability subjects that are likely important for practitioners, and finally, Section 10 highlights software resources that accompany this text.

2 Notation

To facilitate technical descriptions of explanatory techniques, notation for input and output spaces, datasets, and models is defined.

2.1 Spaces

  • Input features come from the set contained in a P-dimensional input space,

  • Known labels corresponding to instances of come from the set .

  • Learned output responses come from the set .

2.2 Datasets

  • The input dataset is composed of observed instances of the set with a corresponding dataset of labels , observed instances of the set .

  • Each -th observation of is denoted as , with corresponding -th labels in , and corresponding predictions in .

  • and consist of tuples of observations:

  • Each

    -th input column vector of

    is denoted as .

2.3 Models

  • A type of machine learning model , selected from a hypothesis set , is trained to represent an unknown signal-generating function observed as with labels using a training algorithm : .

  • generates learned output responses on the input dataset , and on the general input space .

  • The model to be explained is denoted as .

3 Surrogate Decision Trees

The phrase surrogate model is used here to refer to a simple model of a complex model . This type of model is referred to by various other names, such as proxy or shadow models and the process of training surrogate models is sometimes referred to as model extraction [8], [30], [5].

3.1 Description

Given a learned function , a set of learned output responses , and a tree splitting and pruning approach , a global – or over all – surrogate decision tree can be extracted such that :


Decision trees can be represented as directed graphs where the relative positions of input features can provide insight into their importance and interactions [6]. This makes decision trees useful surrogate models. Input features that appear high and often in the directed graph representation of are assumed to have high importance in . Input features directly above or below one-another in are assumed to have potential interactions in . These relative relationships between input features in can be used to verify and analyze the feature importance, interactions, and predictions of .

Figure 2: Shapley summary plot for known signal-generating function , and for learned GBM response function .

Figures 2 and 3 use simulated data to empirically demonstrate the desired relationships between input feature importance and interactions in the input space , the label space , a GBM model to be explained , and a decision tree surrogate . Data with a known signal-generating function depending on four input features with interactions and with eight noise features is simulated such that:


is trained: such that . Then is extracted by , such that .

Figure 2 displays the local Shapley contribution values for an input feature’s impact on each prediction. Analyzing local Shapley values can be a more holistic and consistent feature importance metric than traditional single-value quantities [20]. Features are ordered from top to bottom by their mean absolute Shapley value across observations in Figure 2, and as expected, and tend to make the largest contributions to followed by and . Also as expected, noise features make minimal contributions to . Shapley values are discussed in detail in Section 6.

Figure 3 is a directed graph representation of that prominently displays the importance of input features and along with and . Figure 3 also visually highlights the potential interactions between these inputs. URLs to the data and software used to generate Figures 2 and 3 are available in Section 10.

Figure 3: for previously defined known signal-generating function and learned GBM response function . An image of the entire directed graph is available in the supplementary materials described in Section 10.

3.2 Recommendations

  • A shallow-depth displays a global, low-fidelity (e.g. approximate), high-interpretability flow chart of important features and interactions in . Because there are few theoretical guarantees that truly represents , always use error measures to assess the trustworthiness of .

  • Prescribed methods for training do exist [8] [5]. In practice, straightforward cross-validation approaches are often sufficient. Moreover, comparing cross-validated training error to traditional training error can give an indication of the stability of the single decision tree .

  • Hu et al. use local linear surrogate models, , in leaf nodes to increase overall surrogate model fidelity while also retaining a high degree of interpretability [16].

4 Partial Dependence and Individual Conditional Expectation (ICE) Plots

Partial dependence (PD) plots are a widely-used method for describing the average predictions of a complex model across some partition of data for some interesting input feature [10]. Individual conditional expectation (ICE) plots are a newer method that describes the local behavior of for a single instance . Partial dependence and ICE can be combined in the same plot to identify interactions modeled by and to create a holistic portrait of the predictions of a complex model for some [12].

4.1 Description

Following Friedman et al. a single feature and its complement set (where ) is considered. for a given feature

is estimated as the average output of the learned function

when all the components of are set to a constant and is left unchanged. for a given instance and feature is estimated as the output of when is set to a constant and all other features are left untouched. Partial dependence and ICE curves are usually plotted over some set of constants .

Figure 4: Partial dependence and ICE curves for previously defined known signal-generating function , learned GBM response function , and important input feature .

As in Section 3, simulated data is used to highlight desirable characteristics of partial dependence and ICE plots. In Figure 4

partial dependence and ICE at the minimum, maximum, and each decile of

are plotted. The known quadratic behavior of is plainly visible, except for high value predictions, the 80th percentiles of and above and for . When partial dependence and ICE curves diverge, this often points to an interaction that is being averaged out of the partial dependence. Given the form of Equation 2, there is a known interaction between and . Combining the information from partial dependence and ICE plots with can help elucidate more detailed information about modeled interactions in . For the simulated example, shows an interaction between and and additional modeled interactions between , , and for URLs to the data and software used to generate Figure 4 are available in Section 10.

4.2 Recommendations

  • Combining with partial dependence and ICE curves is a convenient method for detecting, confirming, and understanding important interactions in .

  • As monotonicity is often a desired trait for interpretable models, partial dependence and ICE plots can be used to verify the monotonicity of on average and across percentiles of w.r.t. some input feature .

5 Local Interpretable Model-agnostic Explanations (LIME)

Global and local scope are key concepts in explaining machine learning models and predictions. Section 3 presents decision trees as a global – or over all – surrogate model. As learned response functions, , can be complex, simple global surrogate models can sometimes be too approximate to be trustworthy. LIME attempts to create more representative explanations by fitting a local surrogate model, , in the local region of some observation of interest . Both and local regions can be defined to suit the needs of users.

5.1 Description

Ribeiro et al. specifies LIME for some observation as:


where is an interpretable surrogate model of , often a linear model , is a weighting function over the domain of , and limits the complexity of [23]. Following Ribeiro et al. is often trained by:


where is sampled from , weighs samples by their Euclidean similarity to to enforce locality, local feature contributions are estimated as the product of coefficients and their associated observed values , and is defined as a LASSO, or L1, penalty on coefficients inducing sparsity in .

Figure 5 displays estimated local feature contribution values for the same and simulated with known signal-generating function used in previous sections. To increase the nonlinear capacity of the three models, information from the Shapley summary plot in Figure 2 is used to select inputs to discretize before training each : and . Table 1 contains prediction and fit information for and . This is critical information for analyzing LIMEs.

Percentile Prediction Prediction Intercept R2
10th 0.16 0.13 0.53 0.72
Median 0.30 0.47 0.70 0.57
90th 0.82 0.86 0.76 0.40
Table 1: and predictions and intercepts and fit measurements for the models trained to explain at the 10th, median, and 90th percentiles of previously defined and known signal-generating function .
Figure 5: Sparse, low-fidelity local feature contributions found using LIME at three percentiles of for known signal-generating function .

Table 1 shows that LIME is not necessarily locally accurate, meaning that the predictions of are not always equal to the prediction of . Moreover, the three

models do not necessarily explain all of the variance of

predictions in the local regions around the three of interest. intercepts are also displayed because local feature contribution values, , are offsets from the local intercepts.

An immediately noticeable characteristic of the estimated local contributions in Figure 5

is their sparsity. LASSO input feature selection drives some

coefficients to zero so that some local feature contributions are also zero. For the 10th percentile prediction, the local R2 is adequate and the LIME values appear parsimonious with reasonable expectations. The contributions from discretized and outweigh all other noise feature contributions and the and contributions are all negative as expected for the relatively low value of .

For the median prediction of , it could be expected that some estimated contributions for and should be positive and others should be negative. However, all local feature contributions are negative due to the relatively high value of the intercept at the median percentile of . Because the intercept is quite large compared to the prediction, it is not alarming that all the and contributions are negative offsets w.r.t. the local intercept value. For the median prediction, also estimates that the noise feature has a fairly large contribution and the local R2

is probably less than adequate to generate fully trustworthy explanations.

For the 90th percentile of predictions, the local contributions for and are positive as expected for the relatively high value of , but the local R2 is somewhat poor and the noise feature has the highest local feature contribution. This large attribution to the noise feature could stem from problems in the LIME procedure or in the fit of to . Further investigation, or model debugging, is conducted in Section 6.

Generally the LIMEs in Section 5 would be considered to be sparse or high-interpretability but also low-fidelity explanations. This is not always the case with LIME and the fit of some to a local region around some will vary in accuracy. URLs to the data and software used to generate Table 1 and Figure 5 are available in Section 10.

5.2 Recommendations

  • Always use fit measures to assess the trustworthiness of LIMEs.

  • Local feature contribution values are often offsets from a local intercept. Note that this intercept can sometimes account for the most important local phenomena. Each LIME feature contribution can be interpreted as the difference in and some local offset, often , associated with some feature .

  • Some LIME methods can be difficult to deploy for explaining predictions in real-time. Consider highly deployable variants for real-time applications [15], [16].

  • Always investigate local intercept values. Generated LIME samples can contain large proportions of out-of-domain data that can lead to unrealistic intercept values.

  • To increase the fidelity of LIMEs, try LIME on discretized input features and on manually constructed interactions. Use to construct potential interaction terms.

  • Use cross-validation to estimate standard deviations or even confidence intervals for local feature contribution values.

  • When relying only on local linear models, note that LIME can fail to create acceptable explanations, particularly in the presence of extreme nonlinearity or high-degree interactions. Other types of local models with model-specific explanatory mechanisms, such as decision trees or neural networks, can be used in these cases.

6 Tree Shap

Shapley explanations, including tree shap and even certain implementations of LIME, are a class of additive, consistent local feature contribution measures with long-standing theoretical support [20]. Shapley explanations are the only possible locally accurate and consistent feature contribution values, meaning that Shapley explanation values for input features always sum to and that Shapley explanation values can never decrease for some when is changed such that truly makes a stronger contribution to [20].

6.1 Description

For some observation , Shapley explanations take the form:


In Equation 5, is a binary representation of where 0 indicates missingness. Each is the local feature contribution value associated with and is the average of .

Shapley values can be estimated in different ways. Tree shap is a specific implementation of Shapley explanations. It does not rely on surrogate models. Both tree shap and a related technique known as treeinterpreter rely instead on traversing internal tree structures to estimate the impact of each for some of interest [19], [25].


Unlike treeinterpreter and as displayed in Equation 6, tree shap and other Shapley approaches estimate as the difference between the model prediction on a subset of features without , , and the model prediction with and , , summed and weighed appropriately across all subsets of that do not contain , . (Here incorporates the mapping between and the binary vector .) Since trained decision tree response functions model complex dependencies between input features, removing different subsets of input features helps elucidate the true impact of removing from .

Figure 6: Complete, consistent local feature contributions found using tree shap at three percentiles of and for known signal generating function .

Simulated data is used again to illustrate the utility of tree shap. Shapley explanations are estimated at the 10th, median, and 90th percentiles of for simulated with known signal-generating function . Results are presented in Figure 6. In contrast to the LIME explanations in Figure 5, the Shapley explanations are complete, giving a numeric local contribution value for each non-missing input feature. At the 10th percentile of predictions, all feature contributions for and are negative as expected for this relatively low value of and their contributions obviously outweigh those of noise features.

For the median prediction of , the Shapley explanations are somewhat aligned with the expectation of a split between positive and negative contributions. , and are negative and the contribution for is positive. Like the LIME explanations at this percentile in Figure 5, the noise feature has a relatively high contribution, higher than that of , likely indicating that is over-emphasizing in the local region around the median prediction.

As expected at the 90th percentile of all contributions from and are positive and much larger than the contributions from noise features. Unlike the LIME explanations at the 90th percentile of in Figure 5, tree shap estimates only a small contribution from . This discrepancy may reveal a pair-wise linear correlation between and in the local region around the 90th percentile of that fails to represent the true form of in this region, which can be highly nonlinear and incorporate high-degree interactions. Partial dependence and ICE for and two-dimensional partial dependence between and and could be used to further investigate the form of w.r.t. , along with model debugging techniques discussed briefly in Section 9. URLs to the data and software used to generate Figure 6 are available in Section 10.

6.2 Recommendations

  • Tree shap is ideal for estimating high-fidelity, consistent, and complete explanations of decision tree and decision tree ensemble models, perhaps even in regulated applications to generate regulator-mandated reason codes (also known as turn-down codes or adverse action codes).

  • Because tree shap explanations are offsets from a global intercept, each can be interpreted as the difference in and the average of associated with some input feature [21].

  • Currently treeinterpreter may be inappropriate for some GBM models. Treeinterpreter is locally accurate for some decision tree and random forest models, but is known to be inconsistent like all other feature importance methods aside from Shapley approaches [19]

    . In experiments available in the supplemental materials of this text, treeinterpreter is seen to be locally inaccurate for some XGBoost GBM models.

7 General Recommendations

The following recommendations apply to several or all of the described explanatory techniques or to the practice of applied interpretable machine learning in general.

  • Less complex models are typically easier to explain and some types of models are directly interpretable. Section 9 contains some information about directly interpretable white-box machine learning models.

  • Monotonicity is often a desirable characteristic in interpretable models. (Of course it should not be enforced when a modeled relationship is known to be non-monotonic.) White-box, monotonically constrained XGBoost models along with the explanatory techniques described in this text are a direct and open source way to train and explain an interpretable machine learning model. A monotonically constrained XGBoost GBM is trained and explained in Section 8.

  • Several explanatory techniques are usually required to create good explanations for any given complex model. Users should apply a combination global and local and low- and high-fidelity explanatory techniques to a machine learning model and seek consistent results across multiple explanatory techniques. Simpler low-fidelity or sparse explanations can be used to understand more accurate, and sometimes more sophisticated, high-fidelity explanations.

  • Methods relying on surrogate models or generated data are sometimes unpalatable to users. Users sometimes need to understand their model on their data.

  • Surrogate models can provide low-fidelity explanations for an entire machine learning pipeline in the original feature space if

    is defined to include feature extraction or feature engineering steps.

  • Both understanding and trust are crucial to interpretability. The discussed explanatory techniques should engender a greater understanding of model mechanisms and predictions. But can a model be trusted to perform as expected on unseen data? Its predictions probably do not extrapolate linearly outside of the training, validation, or test data domains. Always conduct sensitivity analysis on your trained machine learning model to understand how it will behave on out-of-domain data.

  • Consider production deployment of explanatory methods carefully. Currently, the deployment of some open source software packages is not straightforward, especially for the generation of explanations on new data in real-time.

8 Credit Card Data Use Case

Some of the discussed explanatory techniques and recommendations will now be applied to a basic credit scoring problem using a monotonically constrained XGBoost binomial classifier and the UCI credit card dataset

[17]. Referring back to Figure 1, a training set and associated labels will be used to train a GBM with decision tree base learners, selected based on domain knowledge from many other types of hypotheses models

, using a monotonic splitting strategy with gradient boosting as the training algorithm

, to learn a final hypothesis model , that approximates the true signal generating function governing credit default in and such that :


is globally explainable with aggregated local Shapley values, decision tree surrogate models , and partial dependence and ICE plots. Additionally each prediction made by can be explained using local Shapley explanations.

To begin, Pearson correlation between inputs and the target, default payment next month, are calculated and stored. All other features except for the observation identifier, ID, are used as inputs. Then 30% of the credit card dataset observations are randomly partitioned into a labeled validation set. Pearson correlations are used to define monotonicity constraints w.r.t. each input feature. Input features with a positive correlation to the target are constrained to a monotonically increasing relationship with the target under

. Input features with a negative correlation to the target are constrained to a monotonically decreasing relationship. (Features with small magnitude correlations or known non-monotonic behavior could also be left unconstrained.) Along with the monotonicity constraints, the non-default hyperparameter settings used to train

are presented in Table 2.

Hyperparameter Value
eta 0.08
subsample 0.9
colsample_bytree 0.9
maxdepth 15
Table 2: hyperparameters for the UCI credit card dataset. Adequate hyperparameters were found by Cartesian grid search.

A maximum of 1000 iterations were used to train , with early stopping triggered after 50 iterations without validation AUC improvement. This configuration led to a final validation AUC of 0.781 after only 100 iterations.

Figure 7: Shapley summary plot for in a 30% validation set randomly sampled from the UCI credit card dataset.

The global feature importance of evaluated in the validation set and ranked by mean absolute Shapley value is displayed in Figure 7. PAY_0 – a customer’s most recent repayment status, LIMIT_BAL – a customer’s credit limit, and BILL_AMT1 – a customer’s most recent bill amount are globally the most important features, which aligns with reasonable expectations and basic domain knowledge. (A real-world credit scoring application would be unlikely to use LIMIT_BAL as an input feature because this feature could cause target leakage. LIMIT_BAL is used in this small data example to improve fit.) The monotonic relationship between each input feature and output is also visible in Figure 7. Numeric Shapley explanation values appear to increase only as an input feature value increases as for PAY_0, or vice versa, say for LIMIT_BAL.

Figure 8: Partial dependence and ICE curves for learned GBM response function and important input feature PAY_0 in a 30% validation set randomly sampled from the UCI credit card dataset.

Partial dependence and ICE for and the important input feature PAY_0 verify the monotonic increasing behavior of w.r.t. to PAY_0. For several percentiles of predicted probabilities and on average, the output of is low for PAY_0 values -2 – 1 then increases dramatically. PAY_0 values of -2 – 1 are associated with on-time or 1 month late payments. A large increase in predicted probability of default occurs at PAY_0 = 2 and predicted probabilities plateau after PAY_0 = 2. The lowest and highest predicted probability customers do not display the same precipitous jump in predicted probability at PAY_0 = 2. If this dissimilar prediction behavior is related to interactions with other input features, that may be evident in a surrogate decision tree model.

Figure 9: for in a 30% validation set randomly sampled from the UCI credit card dataset. An image of a depth-five directed graph is available in the supplementary materials described in Section 10.

To continue explaining , a simple depth-three model is trained to represent in the validation set. is displayed in Figure 9. has a mean across three random folds in the validation set of 0.86 with a standard deviation of 0.0011 and a mean RMSE across the same folds of 0.08 with a standard deviation of 0.0003, indicating is likely accurate and stable enough to be a helpful explanatory tool. The global importance of PAY_0 and the increase in associated with PAY_0 = 2 is reflected in the simple model, along with several potentially important interactions between input features. For instance the lowest predicted probabilities from occur when a customer’s most recent repayment status, PAY_0, is less than 0.5 and their second most recent payment amount, PAY_AMT2, is greater than or equal to NT$ 4747.5. The highest predicted probabilities from occur when PAY_0 1.5, a customers fifth most recent repayment status, PAY_5, is 1 or more months late, and when a customer’s fourth most recent bill amount, BILL_AMT4 is less than NT$ 17399.5. In this simple depth-three model, it appears that an interaction between PAY_0 and PAY_AMT2 may be leading to the very low probability of default predictions displayed in Figure 8, while interactions between PAY_0, PAY_5, and BILL_AMT4 are potentially associated with the highest predicted probabilities. A more complex and accurate depth-five model is available in the supplemental materials described in Section 10 and it presents greater detail regarding the interactions and decision paths that could lead to the modeled behavior for the lowest and highest probability of default customers.

Figure 10: Complete, consistent local feature contributions found using tree shap at three percentiles of in a 30% validation set randomly sampled from the UCI credit card dataset.

Figure 10 displays local Shapley explanation values for three customers at the 10th, median, and 90th percentiles of in the validation set. The plots in Figure 10 are representative of the local Shapley explanations that could be generated for any . The values presented in Figure 10 are aligned with the general expectation that Shapley contributions will increase for increasing values of . Reason codes to justify decisions based on predictions can also be generated for arbitrary using local Shapley explanation values and the values of input features in . Observed values of are available in the supplementary materials presented in Section 10. For the customer at the 90th percentile of the likely top three reason codes to justify declining further credit are:

  • Most recent payment is 2 months delayed.

  • Fourth most recent payment is 2 months delayed.

  • Third most recent payment amount is NT$ 0.

Analysis for an operational, mission-critical machine learning model would likely involve further investigation of partial dependence and ICE plots and perhaps deeper analysis of models following Hu et al [16]. Analysis would also probably continue on to diagnostic, or model debugging, and fairness techniques such as:

  • Disparate impact analysis: to uncover any unfairness in model predictions or errors across demographic segments.

  • Residual analysis

    : to check the fundamental assumptions of the model against relevant data partitions and to investigate outliers or observations exerting undue influence on


  • Sensitivity analysis: to explicitly test the trustworthiness of model predictions on simulated out-of-domain data or in other simulated scenarios of interest.

A successful explanatory and diagnostic analysis must also include remediating any discovered issues and documenting all findings. Examples of more detailed analyses along with the URLs to the data and software used to generate Figures 710 are available in Section 10.

9 Suggested Reading

As stated in the introduction, this text focuses on a fairly narrow but practical sub-discipline of machine learning interpretability. As interpretability truly is a diverse subject with many other practically useful areas of study, additional subjects are suggested for further reading.

9.1 White-box Models

The application of post-hoc explanatory techniques is convenient for previously existing machine learning models, workflows, or pipelines. However, a more direct approach may be to train an interpretable white-box machine learning model which may or may not require additional post-hoc explanatory analysis. Monotonic XGBoost is an excellent option to evaluate because the software is open source, readily available, easily installable and deployable, and highly scalable [7]. Acclaimed work by the Rudin group at Duke University is also likely of interest to many users. They have developed several types of rule-based models [3], [31], linear model variants [28], and many other novel algorithms suitable for use in high stakes, mission-critical prediction and decision-making scenarios.

9.2 Explainable Neural Networks (xNNs)

Often considered the least transparent of black-box models, recent work in xNN implementation and explaining artificial neural network (ANN) predictions may render that notion of ANNs obsolete. Many of the breakthroughs in ANN explanation stem from the straightforward calculation of accurate derivatives of the trained ANN response function w.r.t. to input features made possible by the proliferation of deep learning toolkits such as tensorflow

[22]. These derivatives allow for the disaggregation of the trained ANN response function prediction, , into input feature contributions for any observation in the domain of . Popular techniques have names like DeepLIFT and integrated gradients [26], [27], [2]

. Explaining ANN predictions is impactful for at least two major reasons. While most users will be familiar with the wide-spread use of ANNs in pattern recognition, they are also used for more traditional data mining applications such as fraud detection, and even for regulated applications such as credit scoring

[14]. Moreover, ANNs can now be used as accurate and explainable surrogate models, potentially increasing the fidelity of both global and local surrogate model techniques. For an excellent discussion of xNNs in a practical setting see Explainable Neural Networks based on Additive Index Models by the Wells Fargo Corporate Model Risk group [29].

9.3 Fairness

Fairness is yet another important facet of interpretability, and an admirable goal for any machine learning project whose outcomes will affect human lives. Traditional checks for fairness include assessing the average prediction, accuracy, and error across demographic segments. Today the study of fairness in machine learning is widening and progressing rapidly. Users who would like to stay abreast of developments in the fairness space should follow the free online book by leading researchers, Fairness and Machine Learning [4]. The book’s website currently includes references to other fairness materials. Users may also be interested in the broader organization for fairness, accountability, and transparency in machine learning or FATML. FATML maintains a list of pertinent scholarship on their website: https://www.fatml.org/.

9.4 Model Debugging

As alluded to in previous sections, the techniques in this text enable practitioners to explain and hopefully understand complex models. Trust is a related and somewhat orthogonal concept in interpretability, because models can certainly be explained and not trusted, and conversely, be trusted and not explainable. The growing field of model debugging empowers users to trust their complex model predictions and to diagnose and treat any discovered undesirable behaviors. While residual and sensitivity analysis typically play a role in model debugging exercises, readers are encouraged to investigate newer methods such as anchors and burgeoning work in adversarial examples [24], [32].

10 Supplementary Materials and Software Resources

To make the discussed results useful and reproducible for practitioners, several online supporting materials and software resources are freely available.

11 Acknowledgements

The author wishes to thank Sri Ambati, Arno Candel, Mark Chan, Doug Deloy, Lauren DiPerna, Mateusz Dymczyk, Navdeep Gill, Megan and Michal Kurka, Lingyao Meng, Mattias Müller, Wen Phan and other makers past and present at H2O.ai who turned explainability into a software application and learned the lessons discussed in this text along with me. The author thanks Leland Wilkinson at H2O.ai and the University of Illinois at Chicago (UIC) for his continued mentorship and guidance.