Counterfactual Instances Explain Little

by   Adam White, et al.
City, University of London

In many applications, it is important to be able to explain the decisions of machine learning systems. An increasingly popular approach has been to seek to provide counterfactual instance explanations. These specify close possible worlds in which, contrary to the facts, a person receives their desired decision from the machine learning system. This paper will draw on literature from the philosophy of science to argue that a satisfactory explanation must consist of both counterfactual instances and a causal equation (or system of equations) that support the counterfactual instances. We will show that counterfactual instances by themselves explain little. We will further illustrate how explainable AI methods that provide both causal equations and counterfactual instances can successfully explain machine learning predictions.



There are no comments yet.


page 1

page 2

page 3

page 4


On the computation of counterfactual explanations – A survey

Due to the increasing use of machine learning in practice it becomes mor...

Shaking the foundations: delusions in sequence models for interaction and control

The recent phenomenal success of language models has reinvigorated machi...

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Lack of understanding of the decisions made by model-based AI systems is...

Do the laws of physics prohibit counterfactual communication?

It has been conjectured that counterfactual communication is impossible,...

Toward a new instances of NELL

We are developing the method to start new instances of NELL in various l...

Contrastive Counterfactual Visual Explanations With Overdetermination

A novel explainable AI method called CLEAR Image is introduced in this p...

Counterfactual Explanations for Machine Learning: A Review

Machine learning plays a role in many deployed decision systems, often i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning systems are increasingly being used for automated decision making. It is important that these systems’ decisions can be trusted. This is particularly the case in mission critical situations such as medical diagnosis, airport security or high-value financial trading. But a machine learning system cannot be trusted simply on the basis of its accuracy with a test data set. The user also needs to be able to understand why the system is making its specific predictions. One possible solution is to treat machine learning systems as ‘black-boxes’ and to explain their input-output behaviour. Such approaches can be divided into two broad types: those providing global explanations of the entire system and those providing local explanations of single predictions. Local explanations are needed when a machine learning system’s decision boundary is too complex to allow for global explanations.

miller2018explanation carried out a review of over 250 papers on explanation taken from the disciplines of philosophy, psychology and cognitive science. He states that perhaps his most important finding was that explanations are contrastive counterfactuals, seeking to answer the question ‘Why event E rather than event F?’ F is referred to as E’s foil and comes from a contrast class of events that were alternatives to E, but which did not happen. When a person asks for an explanation, the relevant contrast class is often not explicitly conveyed but instead is implicit in the explanatory question.

Multiple explainable AI (XAI) methods have been proposed that claim to provide ‘counterfactual explanations’ of single machine learning predictions. The explanation consists of either a single or multiple counterfactual instances. It is the argument of this paper that a satisfactory explanation must consist of both counterfactual instances and a causal equation (or system of equations) that supports the counterfactual instances. Counterfactual instances by themselves explain little.

The remainder of this paper is organised as follows: Section 2 provides an overview of counterfactual instances and highlights karimi2021algorithmic’s (karimi2021algorithmic) argument that existing XAI methods often fail to successfully generate actionable counterfactuals. Section 3 specifies the requirements for a satisfactory scientific explanation and argues that these are not satisfied by statements of counterfactual instances. Section 4 illustrates how XAI methods that provide both causal equations and counterfactual instances can satisfy these requirements. Section 5 discusses aspects of understandability, summarises the paper and discusses future directions of work.

2 Counterfactual Instances

A counterfactual instance specifies a close possible world in which, contrary to the facts, a person gets their desired outcome. For example, suppose that a banking machine learning system declined Mr Jones’ loan application and that the feature vector for Mr Jones was {income: $32,000, age: 45, education: graduate}; then a counterfactual instance for Mr Jones might be {income: $35,000, age: 45, education: graduate} where the $3000 increase in salary is sufficient to flip Mr Jones to the desired side of the banking system’s decision boundary.

wachter2017counterfactual first proposed using counterfactual instances as explanations of single machine learning predictions. This type of explanation was intended primarily to be for the benefit of the data subject defined as ‘the natural person whose data is being collected or evaluated’. It has two main objectives (i) understanding: to help a person understand why a machine learning system made a particular prediction affecting them (ii) recourse: to specify what the person would need to change for the machine learning system to then produce the person’s desired outcome. Watcher et al. define a counterfactual explanation as having the form:

”Score was returned because variables had values () associated with them. If instead had values (), and all other variables remained constant, score would have been returned” [4]wachter2017counterfactual

wachter2017counterfactual state that their approach does not rely on ‘any knowledge of the causal structure of the world’. Nor is it intended to provide approximate explanations of a machine learning system’s algorithm or to provide feature scores. They note that their proposed type of explanation is different to those found in the previous ‘machine learning legal and ethics literature’ As we will illustrate, it is also substantially different to those proposed in the philosophy of science. They state that a similar type of explanation has been proposed by Nozick (1983, p.172:174) within the philosophy of knowledge. Nozick is providing an analysis of the necessary conditions for propositional knowledge. He argues that satisfying the following subjunctive conditional may be necessary for a person to know proposition :

If weren’t true, wouldn’t believe that

As Watcher et al. note, Nozick’s conditional can be understood as a counterfactual (given is true). Watcher et al.’s paper then goes directly from considering Nozick’s proposal to providing their definition of a counterfactual explanation. No philosophical argument is given to explain why Nozick’s conditional provides a justification for a form of explanation applicable to XAI.

There are now many XAI methods that attempt to generate ‘optimal’ counterfactual instances, for example karimi2020survey

review sixty counterfactual instance methods. The algorithms differ in the constraints they place and the attributes referenced in their loss functions (

verma2020counterfactual). Desiderata often include that a counterfactual instance is: (1) actionable – e.g. does not recommend that a person reduces their age (2) near to the original observation - common measures include Manhattan distance, L1 norm and L2 norm (3) sparse – only changing the values of a small number of features (4) plausible - e.g. the counterfactual instance must correspond to a high density part of the training data (5) efficient to compute. These XAI methods can be broadly viewed as having the following objective function for a counterfactual (this is adapted from karimi2021algorithmic and ustun2019actionable):

where is the machine learning system, is the observation, class is the label for prediction , is a nearest counterfactual, and are optional feasibility and plausibility constraints.

karimi2021algorithmic show that counterfactual instances satisfying this function will often be suboptimal or infeasible; because they do not take account of the causal structure that determines the consequences of the person’s actions. The underlying problem is that unless all of the person’s features are causally independent of each other, then when the person acts to change the value of one feature, other downstream dependent features may also change. karimi2021algorithmic’s proposed solution uses Pearl’s interventionist calculus to reformulate the recourse problem. The objective function now being to find the minimum cost set of actions that will achieve the desired outcome, where these actions are modelled as interventions on the structural causal model that subsumes the relevant portion of the person’s world. In order to reliably compute a set of recourse actions, the structural causal model will need to be at least approximately true. Unfortunately, as karimi2021algorithmic note, the true causal model of the world is rarely known, and therefore their proposed solution currently has limited applicability. Nevertheless, the key point for this paper is that the counterfactual instances being produced by current XAI methods are often likely to fail the second of Watcher et al.’s objectives: identifying how a person should act in order to achieve their desired outcome.

3 Theories of Scientific Explanation

The structure of a satisfactory scientific explanation has been extensively studied throughout the history of philosophy. The deductive-nomological (henceforth: D-N) account of scientific explanation once dominated the philosophy of science. According to this, to explain a phenomenon is to subsume it under general laws of nature. For systems with deterministic laws, successful scientific explanations were taken to have the form of the D-N model , as specified by Hempel and Oppenheim (1948):

Figure 1: The schema for a DN-Explanation

The explanans must include statements of general laws of nature that are essential to the derivation of the explanandum. The laws are general, in the sense that statements of the laws make no reference to particulars and are true without exception. The explanandum is explained by showing that it is an instantiation of these laws. A similar account of explanation was held to apply for systems with probabilistic laws, but it is sufficient for our purposes just to focus on D-N explanations. The D-N Model has been subject to some well-known counterexamples (Salmon, salmon2020scientific, p.46-50). Consider, for example, the following deductive argument:

Every man who regularly takes birth control pills avoids pregnancy
John Jones regularly takes birth control pills
John Jones avoids becoming pregnant

This satisfies the criteria for being a D-N explanation but clearly fails to be explanatory. Such counterexamples have highlighted serious shortcomings with the D-N Model, including that: (i) irrelevant premises can be used to deduce and hence ‘explain’ an explanandum (ii) no temporal priority is required between the explanans and the explanandum (iii) perhaps most importantly, D-N explanations fail because a successful explanation of an event requires stating its causes.

The dominant accounts of scientific explanation are now the counterfactual theories of causal explanation found in Woodward (Woodward) and Halpern and Pearl (halpern2005causes). This paper will primarily focus on the account found in Woodward’s book ‘Making Things Happen: A Theory of Causal Explanation’, which won the Lakatos Award for outstanding contribution to philosophy of science. Woodward and Pearl (Pearl) share similar analyses, with Woodward focussing more on the philosophy rather than the calculus of causality 111See woodward2003critical for an analysis of the differences between Woodward and Pearl’s specification of an intervention. However this papers arguments for XAI also apply, mutatis mutandis, to Pearl’s work.

Before providing Woodward’s definition of causal explanation, it is first necessary to briefly outline his notions of ideal intervention and invariance. For Woodward, causal relationships relate variables. Variables are properties or magnitudes that can have more than one value; and the values of variables are possessed by particular entities. (Woodward, 2003, p.39).

An ‘ideal intervention’ on with respect to exogenously changes the value of , such that any change that occurs to the value of occurs only because of the change in the value of . Woodward’s specification of an ideal intervention involves an ‘intervention variable’ which acts like a ‘switch’. When is ‘switched on’:

  1. [topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex]

  2. causes . ’s value is solely a function of .

  3. This means that all connections between and its pre-intervention causes are ‘broken’.

  4. changes the value of , if at all, only by changing .

  5. does not alter the relationship between and any of its causes that are not on a directed path from to . (Woodward, woodward2008invariance, p.202-203)

An ideal intervention on with respect to consists in being ‘switched on’ (see Figure 2). The relationship between and is ‘invariant’ if it holds for at least one ‘testing intervention’. Let the relationship between and be represented by the generalisation:

A testing intervention is an ideal intervention that changes the value of from, say, to and establishes that:

A necessary and sufficient condition for a generalisation between variables and to represent a causal relationship is that it is invariant (Woodward, 2003, p.250). is a direct cause of with respect to a variable set if there is a possible change to the value of that will change the value of when all other variables in (besides ) are held fixed.

Figure 2: Example of an ideal intervention. and are direct causes of . is an intervention variable for with respect to . When is switched on ceases to depend on and depends only on .

Woodward provides the following definition of a causal explanation for a simple system without multiple connections:

“Suppose that is an explanandum consisting in the statement that some variable takes the particular value . Then an explanans for will consist of (a) a generalization relating changes in the value(s) of a variable (where may itself be a vector or n-tuple of variables ) with changes in , and (b) a statement (of initial or boundary conditions) that the variable takes the particular value x. A necessary and sufficient condition for to be (minimally) explanatory with respect to is that (i) and be true or approximately so; (ii) according to , takes the value under an intervention in which takes the value ; (iii) there is some intervention that changes the value of from to where , with correctly describing the value that would assume under this intervention, where ." (Woodward, 2003, p.203)

Central to Woodward’s account is the requirement for an invariant generalisation. This is a causal equation in which the dependent variable is the effect and the independent variables are a complete set of its direct causes. This equation(s) support counterfactuals. An explanation reveals patterns of counterfactual dependence; and the purpose of counterfactual instances is to provide answers to a set of ‘what-if-things-had-been-done-differently’ questions. The generalisation(s) might only be true for a particular domain, and in such cases provides a local explanation. For Woodward, all causal claims are counterfactual and contrastive: ‘to causally explain an outcome is always to explain why it, rather than some alternative, occurred’(Woodward, 2003, p.146).

This paper endorses Woodward’s theory as specifying the structure of a satisfactory XAI explanation. A machine learning system’s prediction is caused by the values of the features of observation . A satisfactory explanation therefore needs to specify (i) an approximately true invariant generalisation relating the prediction to the input features and (ii) a set of counterfactual instances that are supported by the invariant generalisation. The resulting explanation reveals the (approximate) direct causal relationships that the machine learning system has learnt between the features of and prediction .

It can be argued that counterfactual instances are a type of contrastive causal explanation (e.g. see karimi2020survey). They identify a subset of the causes of a machine learning system predicting class for observation , and specify counterfactual changes in the values of those causes that would result in predicting class . But these are, at best, shallow causal explanations. They have multiple failings. For example, they do not identify: the relative importance of different causes, each cause’s functional form or any interactions between the causes. Furthermore, they might not state all the key causes of an event, Consider again Mr Jones’ failed loan application. It may have been the case that the machine learning system was completely insensitive to any changes in Mr Jones’ salary until it exactly reached $35,000, or perhaps Mr Jones only needs to increase his salary to $34,000 if he also enrols on an MBA. Providing only counterfactual instances is clearly insufficient to understanding the machine learning system’s local behaviour. Imagine another science, say physics, treating a statement of counterfactual instances as an explanation, rather than seeking to discover the governing equation(s).

4 Current XAI Methods

Two XAI methods will now be outlined, in order to illustrate how satisfactory causal explanations can be produced.222It is unclear as to the extent to which other prominent XAI methods could be adapted to satisfy Woodward’s specification e.g. LORE (guidotti2018local

) generates local decision trees rather than equations; Kernel SHAP’s (

lundberg2017unified) regression coefficients are SHAP values and therefore do not directly calculate the effect of changing a feature’s value on the machine learning system’s prediction. . The first is LIME: Local Interpretable Model-agnostic Explanations (LIME), which generates a causal equation, however LIME would have to be enhanced to produce counterfactual instances. The second is CLEAR: Counterfactual Local Explanations viA Regression (white2019measurable; white2021contrastive) which was explicitly developed to satisfy Woodward’s definition. Both explain single predictions by performing local weighted regressions in which the dependent variable is the prediction and the independent variables are the input features , a subset of which are the direct causes of . The regressions are carried out on a synthetic dataset, with the values of each synthetic observation being generated by replacing subsets of the observation x with random samples from parametric distributions and passing through the machine learning system to determine the corresponding . Hence both methods are assuming that the values of each direct cause can be independently changed (a requirement of Woodward’s theory). CLEAR was developed after LIME, with key differences including that: CLEAR discovers actual counterfactual instances by sampling , includes the counterfactual instances in its regression dataset, and uses GAN generated images for explaining image classifications. CLEAR also measures its fidelity i.e. how faithfully it mimics the input-out behaviour of the machine learning system it seeks to explain. Figure 3

shows extracts from a CLEAR explanation for the classification probability of an observation taken from the Adult dataset. This illustrates how an XAI method can satisfy Woodward’s requirements, with CLEAR providing a causal equation, counterfactual instances and reporting fidelity statistics (see

white2019measurable for a detailed comparison of LIME and CLEAR; see white2021contrastive for how CLEAR uses a GAN to provide contrastive explanations of image classifications).

Figure 3: Extracts from a CLEAR Image report. This explains a prediction

made by a neural network for an observation

from the Adult dataset. CLEAR states the causal equation that applies to the neighbourhood around . Three counterfactuals are identified; for example intervening to change to being ’married’, changes

to 0.57. CLEAR’s causal equation estimates that

will change to 0.61, hence there is a fidelity error of 0.04.

5 Discussion

To explain an event is “to provide information about the factors on which it depends and exhibit how it depends on those factors” (Woodward, Woodward, p.204). This requires counterfactual explanations that reveal the causal structure producing the event. This paper has argued that counterfactual instances by themselves do not do this. They need to be supported by either a causal equation or system of equations.

This paper’s focus is in the opposite direction to karimi2021algorithmic. Their focus is on recourse, whilst this paper’s focus is on providing an understanding of why the machine learning system made its prediction. karimi2021algorithmic’s criticism is that XAI methods do not provide actionable counterfactual instances, as they do not take account of the causal structure that generates the machine learning system’s input data. This paper’s criticism concerns counterfactual instances being insufficient to explain the direct causal structure linking the values of an observation ’s features with the machine learning system’s prediction. Existing XAI counterfactual instance methods can be relevant here, as they help to illustrate the direct causal relationships that the machine learning system has learnt; but they need a supporting causal equation.

A concern with providing Woodward’s form of explanation is understandability. In order to satisfy the requirement that the causal equation(s) is at least approximately true, the equation(s) may need to be too complex for non-technical users to understand. For example, CLEAR’s equations may include logistic functions, interaction terms, exponents and so forth. The solution to this is to recognise that different types of users require different levels of explanation. A data scientist may want to see the causal equation that most faithfully mimics the local behaviour of the machine learning system. Sometimes these equations may include terms with functions that contradicts their understanding of the domain, suggesting that the machine learning system’s prediction is untrustworthy. By contrast a bank customer may only want to know which features most impacted the machine learning system’s prediction. The issue then becomes one of deciding how much of the full explanation to provide to a particular user, in what format, and when to drill down into progressively more complex explanations. For example, for the bank customer, perhaps only a bar chart showing the relative weightings of the most important features plus some counterfactual instances might be sufficient.

Since wachter2017counterfactual’s paper, considerable resources have been devoted to developing counterfactual instance methods. Greater focus is now needed on developing methods for discovering the direct causal relations that generate these counterfactual instances, and for making them useful in practice as an explanatory tool that can help increase trust in machine learning systems.