Log In Sign Up

ViCE: Visual Counterfactual Explanations for Machine Learning Models

by   Oscar Gomez, et al.
NYU college

The continued improvements in the predictive accuracy of machine learning models have allowed for their widespread practical application. Yet, many decisions made with seemingly accurate models still require verification by domain experts. In addition, end-users of a model also want to understand the reasons behind specific decisions. Thus, the need for interpretability is increasingly paramount. In this paper we present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions. Each sample is assessed to identify the minimal set of changes needed to flip the model's output. These explanations aim to provide end-users with personalized actionable insights with which to understand, and possibly contest or improve, automated decisions. The results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model. The functionality of the tool is demonstrated by its application to a home equity line of credit dataset.


page 1

page 2

page 3

page 4


DECE: Decision Explorer with Counterfactual Explanations for Machine Learning Models

With machine learning models being increasingly applied to various decis...

AdViCE: Aggregated Visual Counterfactual Explanations for Machine Learning Model Validation

Rapid improvements in the performance of machine learning models have pu...

A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations

Human-in-the-loop data analysis applications necessitate greater transpa...

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Interpretability methods aim to help users build trust in and understand...

TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues

Machine Learning (ML) models are increasingly used to make critical deci...

MCCE: Monte Carlo sampling of realistic counterfactual explanations

In this paper we introduce MCCE: Monte Carlo sampling of realistic Count...

Code Repositories


A visual counterfactual explanation of predictions by machine learning models. Work adapted from

view repo

1. Introduction

The accessibility of high performing machine learning models has resulted in their integration into various applications pertaining to complex and high-risk data. Even in industries such as financial services and health care where the assimilation of predictive models has been slower due to the associated risks, machine learning is now seeing rapid adoption. However, simple accuracy measures that are used to describe models often fail to describe deeper flaws such as hidden biases and false generalizations. In high-risk situations such as cancer diagnosis or fiscal lending such oversight cannot be accommodated and can result in detrimental consequences.

In this paper we present 111

, a novel design for an explainable machine learning visual analytics tool and its evaluation using a case study. The tool is built to describe a machine learning model by breaking down individual predictions. This caters directly to our end-user, who we envision as being the client-facing person trying to better understand predictions made by the model. This could include doctors inferring why a patient is predicted as high risk for diabetes or admissions officers looking into why a particular candidate was rejected. Although could also be useful for model developers, that is not the tool’s primary purpose. The analysis is driven by the introduction of counterfactual explanations. Our tool relies on a new algorithm for calculating counterfactuals that is not limited to binary variables and is intended for use with tabular numerical data. Furthermore, we have created the first visual interface that is able to display these explanations effectively and coherently. is supplemented with functionality that contextualizes the targeted sample with regards to the rest of the dataset. The combination of these features guarantees that the resulting interface does not only clarify the model’s decision but can also be used to pinpoint bias and undesired behaviour.

For the targeted end-user each explanation provides actionable suggestions that can help adjust the model’s prediction. In other words, the tool establishes what changes are required to alter their current state. For example, it could be used by a loan-officer looking to get a previously rejected application approved.

2. Background and Related Works

The problem of machine learning model interpretability and explanation has been recognized by many researchers and practitioners. Previous works (Biran and Cotton, 2017; Doshi-Velez and Kim, 2017; Lipton, 2016; Poursabzi-Sangdeh et al., 2018; Molnar, 2019; Guidotti et al., 2019; Adadi and Berrada, 2018; Carvalho et al., 2019) provide an overview of methods, opportunities and challenges in this area.

To interpret a machine learning model, methods vary according to the category of models. Machine learning models are often categorized into two classes: white-box and black-box. White-box models are those intrinsically interpretable models, where the logic of making a decision is transparent and intelligible (e.g. decision trees, linear regression, etc.) (Molnar, 2019); while black-box models tend to have complex structures and are hard to understand (e.g., neural networks, ensemble models, etc.). In this paper, we introduce an explanation algorithm that is model independent, that is, the method works with any model without having access to its internal logic.

Generally, model explanations can be categorized as local or global. Local explanations try to explain how a decision is made for a specific instance, while global explanation methods refer to showing the overall logical structure of a model. Some approaches such as LIME (Ribeiro et al., 2016), and SHAP (Lundberg and Lee, 2017), focus on generating a weight for each feature as its contribution to the final decision. Others provide explanations through a counterfactual, where the explanation consists of the minimal set of changes to the feature values that allows the prediction for the instance to change to a different outcome. For example, finding the smallest feature perturbation that would change the prediction of a loan application from rejected to approved. Wachter et al. (Wachter et al., 2017) provide a general framework for counterfactual generation using stochastic optimization, while Ustun et al. (Ustun et al., 2019)

present an approach specific to linear classifiers.

As for the presentation of model explanations, visualization has been increasingly used to support the understanding, debugging, verification, and refinement of machine learning models.

As a black-box explanation tool, does not rely on the internal logic of the model, but is designed to let users explore the relationships between inputs (instances) and outputs (predictions). Some existing visual analytic systems follow the black-box approach. For example, Manifold (Zhang et al., 2018) enables the comparison of data distribution at different levels of granularity; RuleMatrix (Ming et al., 2019) visualizes extracted rules for a given model; iForest (Zhao et al., 2019) and Ensemble Matrix (Talbot et al., 2009) attempt to explain ensemble models.

Similarly to our work, the What-If Tool (WIT) (Wexler et al., 2019) tries to answer a what-if question. WIT shows how model predictions change after the inference of data, while our tool, visualizes how to inference from data in order to change a prediction into other classes.

Likewise, Rivelo (Tamagnini et al., 2017) and the Workflow for Visual Diagnostics proposed by Krause et al. (Krause et al., 2017) also provide a solution to a counterfactual question of how to change data to achieve a target class for black-box models. However, their solution adapts an algorithm (Martens and Provost, 2013) originally designed for text documents and works only on binary inputs. Our work extends this algorithm to situations pertaining to continuous numerical data.


The main goal of the proposed tool is to support understanding of individual predictions through counterfactual explanations and to provide an intuitive visual representation for them. More precisely, our objective is to show, for a given instance, what is the minimal set of changes that is required to change the prediction. In our case, we focus on numeric features, therefore the tool has to provide two pieces of information: (1) which features need to change and (2) the extent to which they have to change.

was designed through an iterative design process. We analyzed published work to compile a list of questions end-users may want to answer when using counterfactual explanations and designed several solutions. The final result is the tool we present in the paper. The following list summarizes the desired functionality that we deemed essential to support our goal.

  • [leftmargin=*, labelindent=10pt,topsep=0pt,nosep]

  • Data distribution - How do the values of the instance compare to those across the rest of the dataset?
    Example: If a student has a GRE score of 320, how does it compare to the scores of their peers?

  • Relevant features - Which features have the most considerable effect on the model’s prediction?
    Example: Identifying what variables in a patient’s blood work are significant contributors to a negative diagnosis.

  • Possible changes - Are there changes that could alter the model’s current prediction?
    Example: If an applicant was rejected for a loan, what changes in their profile would be required for the application to be accepted?

  • Actionable changes - Is it possible to change only a subset of actionable features to change the model’s prediction?
    Example: If a graduate school applicant knows certain features cannot be changed such as Gender or Age, is it possible to generate an alternative explanation without altering these features?

In the following two sections we first describe the algorithm developed in detail and then describe the visual solution designed for communicating information about the counterfactual explanations.

3.1. Counterfactual Algorithm

0 The counterfactual algorithm aims to find the minimal set of changes needed to change the model’s output. We implement a simple heuristic algorithm to find changes that are at the same time interpretable (minimal set of features) and feasible (minimal amount of change); characteristics that are crucial for user-friendly explanations

(Miller, 2019).

In order to extend the algorithm proposed in (Martens and Provost, 2013), the entire dataset is discretized by fitting a Gaussian on each of the features and splitting the values into n bins such that the middle n-2

capture four standard deviations from the mean, and the extreme bins capture data points beyond this. The algorithm greedily moves feature values across the bins until the predicted class is changed, or until the pre-defined constraints (no more than

features are changed in a single explanation and no feature value is moved across more than bins) are reached.

The algorithm starts with the original feature values of the instance to explain, and it is given an arbitrary set of unlocked features which can be acted upon. In each iteration, it independently moves the value in each of the unlocked features to the bins above and below the current one and chooses the one eliciting the largest change in the model’s output (in the direction of the opposite category). It then takes the maximum change across all the unlocked features and uses this as the input for the next iteration. This greedy procedure continues until the modified instance crosses the model’s decision boundary or until the constraints can no longer be satisfied.

3.2. Visual Interface

Figure 1. Demonstrating a single local explanation using a diabetes dataset.

1⃝ model’s predicted probability, 2⃝ classification correctness of the model’s prediction, 3⃝ frequency density distribution and the feature value for the given instance, 4⃝ counterfactual explanation, 5⃝ locking functionality, 6⃝ lock, sort, and distribution toggles.

In Fig. 1, we present the explanation for an instance in a diabetes dataset (Johannes, 1988)

. For demonstration purposes, a support vector machine is used. The individual explanation view shows a detailed summary regarding the model’s decision for a single data point while also giving context to the values relative to the rest of the dataset. The percentage bar (Fig.

11⃝) is used to indicate the exact prediction made by the model, thereby quantifying the strength of a prediction beyond the binary result. In our solution, any model prediction value greater than 50% is classified as positive and shown in green while all other decisions are classified as negative and shown in red. The correctness of the prediction is also presented (Fig. 12⃝). It is important to note that knowledge of the ground truth is not a requirement as this information is unavailable in many real use-cases. However, when available, knowing whether the model’s decision is correct helps categorize the sample point as either an example of the model’s desired operation or of its potential shortcomings.

The main part of the interface separates the data by features and displays their numerical values. These values are positioned relative to the distribution for that feature across the entire dataset (Fig. 13⃝). Each attribute column is also supplemented with a density distribution visualisation. Based on the opacity of the purple background, the frequency of occurrence at that position can be analysed. For example, in this explanation the patient’s age and glucose levels are clearly above the average. This information might suggest that these factors are contributing to the false positive prediction. By default, the tool displays the density distribution based on all the data points, however, the user has the option to map the densities based on points with positive or negative target values (Distribution selection in Fig. 16⃝). In other words, it is possible to see how the sample under consideration compares with known positive or negative

predictions. This effectively helps contextualize the values of the sample and highlight the features with singular values.

The local view will also display counterfactual explanations if the conditions set by the algorithm are fulfilled (Fig. 14⃝). Arrow shaped polygons are used to exhibit a single increment in the bins used to discretize the tabular data. The current value and the suggested new level are both shown numerically for clearer reading and detail. The color of these symbols is based on the binary decision made by the model. For a positive prediction red arrows are used to show what changes would result in the decision becoming negative, while green arrows are used for negative instances as indicators for a positive change. In this example, if two features had greater values then the patient would no longer be considered at risk for diabetes. Thus, according to the model, in order for this patient to become healthy they would need to slightly increase their blood pressure and skin thickness levels. This clearly exemplifies how the end-user benefits from having information that extends beyond a binary classifier.

Figure 2. Use case for a sample from the HELOC dataset

To guarantee versatility, a locking function is available to remove certain attributes from consideration (Fig. 15⃝). This can be useful if a user has certain features they deem unable to change or modify. In this case the age feature can be treated as unfeasible to change and can be locked using the icon. In most cases, the counterfactuals are elicited in examples with prediction percentages nearer to the cutoff threshold of 50%. This is due to the fact that samples in which the model predicts a very high or low percentage usually cannot be flipped by implementing a few changes and would require larger modifications than those allowed by the algorithm.

The tool also has a sorting option (Sort in Fig. 16⃝). Toggling the sort button orders the features based on their standardized values. In this way users can quickly identify singular feature values that are considerably above or below the average for a feature. The sort functionality can be very effective in comparing monotonic features and highlighting key attributes.

Notice that the lack of a counterfactual explanation does not mean that no information can be derived from the visualization. Comparing the data values to the density distributions depicted by the shaded area can help identify anomalies and derive hypotheses on why and how the model produces a given prediction. It is also worth noticing that the visualization interface can accommodate any other counterfactual generation methods (Ustun et al., 2019; Wachter et al., 2017; Laugel et al., 2017; Mothilal et al., 2020).

3.3. Implementation

is built as a Flask web application with the back-end running on Python. The visualisations are created using D3 and JavaScript. With versatility in mind the tool accepts any binary classification dataset in a CSV format. A default SVM model is trained with scikit-learn, however, the program also accommodates custom input models as long as probability prediction methods are provided. The data is processed in real time to accommodate customized end-user inputs. For our implementation we split feature values across bins and set for the algorithm constraints.

4. Case Study

To demonstrate how can help with ML explanation, we showcase its use with the Home Equity Line of Credit (HELOC) dataset. The design goals set out in Section 3 are used to evaluate the performance of the tool and are referenced directly in the use case.

The HELOC dataset was released as part of the FICO xML challenge (FICO, 2018). It is comprised of applications made by real homeowners in attempts to get a credit line from the bank. The target is to predict the binary variable Risk Performance where bad indicates that a consumer was at least 90 days past due once and good that they never were. Some initial testing revealed that the

External Risk Estimate

feature had a very strong correlation with the target class. Since this feature is not directly actionable it is initialized as locked. However, the user retains the ability to unlock it if desired.

To simulate the end-user experience an arbitrary client was picked. The chosen instance seen in Fig. 2 shows a negative model prediction with 29% probability and the TN label on the upper left indicates that the model prediction matches the ground truth. Setting the data point into context using the density distributions reveals that there is considerable variation from the dataset averages in a number of the features (Q1). Toggling the sort functionality helps identify the most singular features to be Net Fraction Revolving Burden, Percentage Trades Never Delinquent and Months Since Most Recent Delinquency. The External Risk Estimate score is also considerably lower than the average. Since this sample elicits a number of uncommon values there is a certain degree of subjectivity involved with identifying the features that should be of particular interest. Redrawing the distribution for negative samples shows the frequency density distribution for other known negative points. Using this view it is possible to confirm that the features identified above are significantly different, even in the context of other poorly performing samples (Q2). Furthermore, changing between the general, positive, and negative density frequency distribution views gives an indication of the monotonicity of the features.

To understand what changes would be required by the user to receive a positive prediction we can examine the counterfactuals. This sample cannot be considere an edge case, however, since the percentage prediction is not too low at 29% and there still exist combinations of changes that would result in the model flipping the decision (Q3). Yet, as expected, these changes are significant. The tool suggests that a sizable increase in both the Number of Satisfying Trades and Months Since Most Recent Delinquency and a small rise in the Number of Inquiries in the Last 6 Months excluding last 7 days would be sufficient. Intuitively, all of these changes are manageable, but if the user was in a rush to get their credit line approved the time based features might not be feasible (Q4). Locking these attributes and reloading the explanation generates a new explanation with changes in Average Months in File. Since this is also time dependent it was subsequently locked as well. With these limitations imposed by the user, the algorithm is no longer able to identify a way in which the decision can be changed within the pre-defined constraints. Therefore, it is apparent that the model weights features with time variables highly in its decision making for this instance. For further exploration, unlocking the External Risk Estimate variable instantly demonstrates the strength of its correlation with the model decision. The explanation now suggests large changes in External Risk Estimate as the optimal way of flipping the decision.

5. Limitations

This work is the first step in our goal to provide full end-user oriented model explanations. The tool currently has certain limitations. For example, the algorithm cannot effectively handle categorical features. Possible solutions might involve presetting a search path or performing a brute force analysis of features that are known to be categorical. In addition, the tool does not extend to multi-class classification or other contexts such as image classification. The visualization itself can realistically display a maximum of around 30 features. However, larger datasets can be accommodated by utilizing the sorting feature and only displaying the top k features.

6. Conclusions and Future Work

In this paper, we presented – a novel way for the end-user to gain insight into model predictions through counterfactual explanations. For each sample the minimal set of changes needed to alter the decision was shown. Interacting with the interface allows customizing the explanation according to the user’s requirements. A use case was chronicled by applying the tool on a loan dataset. To the best of our knowledge this tool is the first in visualising counterfactuals for non-binary data. While already providing increased model interpretability, the modular black-box based nature of the tool allows for a seamless integration of improvements such as including different methods to generate counterfactuals, or providing users with a set of alternatives to the displayed counterfactual explanation.

Future work will aim to introduce increased interactivity for the UI. This would include adding an option to view the impact of custom changes inputted by the user. To improve the visualization, additional explanation methods can be integrated. For example, customizing the sorting functionality to order the features according to their local importance magnitudes could provide a way to corroborate the insights gained from the counterfactuals. Finally, extending the tool to a global scale through the aggregation of instance explanations could further increase its usefulness for model developers.


This work was partially supported by the DARPA D3M program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA. We thank the nexquare team, especially Haseeb Ahmed and Sarah Ameen, for providing valuable feedback and sharing their unique perspective, having led multiple data science engagements in the education domain. We also thank the anonymous reviewers for their valuable feedback.


  • A. Adadi and M. Berrada (2018)

    Peeking inside the black-box: a survey on explainable artificial intelligence (xai)

    IEEE Access 6, pp. 52138–52160. Cited by: §2.
  • O. Biran and C. Cotton (2017) Explanation and justification in machine learning: a survey. In IJCAI-17 workshop on explainable AI (XAI), Vol. 8. Cited by: §2.
  • D. V. Carvalho, E. M. Pereira, and J. S. Cardoso (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 8 (8), pp. 832. Cited by: §2.
  • F. Doshi-Velez and B. Kim (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. Cited by: §2.
  • FICO (2018) Explainable machine learning challenge. Note: Cited by: §4.
  • R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi (2019) A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51 (5), pp. 93. Cited by: §2.
  • R. S. Johannes (1988) Using the adap learning algorithm to forecast the onset of diabetes mellitus. Johns Hopkins APL Technical Digest 10, pp. 262–266. Cited by: §3.2.
  • J. Krause, A. Dasgupta, J. Swartz, Y. Aphinyanaphongs, and E. Bertini (2017) A workflow for visual diagnostics of binary classifiers using instance-level explanations. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 162–172. Cited by: §2.
  • T. Laugel, M. Lesot, C. Marsala, X. Renard, and M. Detyniecki (2017) Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443. Cited by: §3.2.
  • Z. C. Lipton (2016) The mythos of model interpretability. arXiv preprint arXiv:1606.03490. Cited by: §2.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pp. 4765–4774. Cited by: §2.
  • D. Martens and F. Provost (2013) Explaining data-driven document classifications. Cited by: §2, §3.1.
  • T. Miller (2019) Explanation in artificial intelligence: insights from the social sciences. Artificial Intelligence 267, pp. 1–38. Cited by: §3.1.
  • Y. Ming, H. Qu, and E. Bertini (2019) RuleMatrix: visualizing and understanding classifiers with rules. IEEE transactions on visualization and computer graphics 25 (1), pp. 342–352. Cited by: §2.
  • C. Molnar (2019) Interpretable machine learning. Note: Cited by: §2, §2.
  • R. K. Mothilal, A. Sharma, and C. Tan (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, New York, NY, USA, pp. 607–617. External Links: ISBN 9781450369367, Link, Document Cited by: §3.2.
  • F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. Vaughan, and H. Wallach (2018) Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810. Cited by: §2.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) Why should i trust you?: explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. Cited by: §2.
  • J. Talbot, B. Lee, A. Kapoor, and D. S. Tan (2009) EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1283–1292. Cited by: §2.
  • P. Tamagnini, J. Krause, A. Dasgupta, and E. Bertini (2017) Interpreting black-box classifiers using instance-level visual explanations. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, pp. 6. Cited by: §2.
  • B. Ustun, A. Spangher, and Y. Liu (2019) Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19. Cited by: §2, §3.2.
  • S. Wachter, B. Mittelstadt, and C. Russell (2017) Counterfactual explanations without opening the black box: automated decisions and the gpdr. Harv. JL & Tech. 31, pp. 841. Cited by: §2, §3.2.
  • J. Wexler, M. Pushkarna, T. Bolukbasi, M. Wattenberg, F. Viégas, and J. Wilson (2019) The what-if tool: interactive probing of machine learning models. IEEE transactions on visualization and computer graphics. Cited by: §2.
  • J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert (2018) Manifold: a model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE transactions on visualization and computer graphics 25 (1), pp. 364–373. Cited by: §2.
  • X. Zhao, Y. Wu, D. L. Lee, and W. Cui (2019)

    IForest: interpreting random forests via visual analytics

    IEEE transactions on visualization and computer graphics 25 (1), pp. 407–416. Cited by: §2.