Machine learning is increasingly used in a wide range of decision-making scenarios that have serious implications for individuals and society, including financial lending [10, 36], hiring [8, 28], online advertising [27, 41], pretrial and immigration detention [5, 43], child maltreatment screening [14, 47], health care [19, 32], and social services [1, 23]. Whilst this has the potential to overcome undesirable aspects of human decision-making, there is concern that biases in the training data and model inaccuracies can lead to decisions that treat historically discriminated groups unfavourably. The research community has therefore started to investigate how to ensure that learned models do not take decisions that are unfair with respect to sensitive attributes (e.g. race or gender).
This effort has led to the emergence of a rich set of fairness definitions [13, 16, 21, 24, 38] providing researchers and practitioners with criteria to evaluate existing systems or to design new ones. Many such definitions have been found to be mathematically incompatible [7, 13, 15, 16, 30], and this has been viewed as representing an unavoidable trade-off establishing fundamental limits on fair machine learning, or as an indication that certain definitions do not map on to social or legal understandings of fairness .
Most fairness definitions express properties of the model output with respect to sensitive information, without considering the relations among the relevant variables underlying the data-generation mechanism. As different relations would require a model to satisfy different properties in order to be fair, this could lead to erroneously classify as fair/unfair models exhibiting undesirable/legitimate biases.
In this manuscript, we use the causal Bayesian network framework to draw attention to this point, by visually describing unfairness in a dataset as the presence of an unfair causal path in the data-generation mechanism. We then use this viewpoint to raise concern on the fairness debate surrounding the COMPAS pretrial risk assessment tool. Finally, we show that causal Bayesian networks offer a powerful tool for representing, reasoning about, and dealing with complex unfairness scenarios.
2 A Graphical View of (Un)fairness
Consider a dataset , corresponding to individuals, where indicates a sensitive attribute, and a set of observations that can be used (possibly together with ) to form a prediction of outcome . We assume a binary setting (unless otherwise specified), and indicate with , , and
the (set of) random variables111Throughout the paper, we use capital and small letters for random variables and their values, and calligraphic capital letters for sets of variables. corresponding to , and respectively.
In this section we show at a high-level that a correct use of fairness definitions concerned with statistical properties of with respect to requires an understanding of the patterns of unfairness underlying , and therefore of the relationships among , and . More specifically we show that:
Using the framework of causal Bayesian networks (CBNs), unfairness in can be viewed as the presence of an unfair causal path from to or .
In order to determine which properties should possess to be fair, it is necessary to question and understand unfairness in .
Assume a dataset corresponding to a college admission scenario in which applicants are admitted based on qualifications , choice of department , and gender ; and in which female applicants apply more often to certain departments. This scenario can be represented by the CBN on the left (see Appendix A for an overview of BNs, and Sect. 3 for a detailed treatment of CBNs). The causal path represents direct influence of gender on admission , capturing the fact that two individuals with the same qualifications and applying to the same department can be treated differently depending on their gender. The indirect causal path represents influence of on through , capturing the fact that female applicants more often apply to certain departments. Whilst the direct path is certainly an unfair one, the paths and , and therefore , could either be considered as fair or as unfair. For example, rejecting women more often due to department choice could be considered fair with respect to college responsibility. However, this could be considered unfair with respect to societal responsibility if the departmental differences were a result of systemic historical or cultural factors (e.g. if female applicants apply to specific departments at lower rates because of overt or covert societal discouragement). Finally, if the college were to lower the admission rates for departments chosen more often by women, then the path would be unfair.
Deciding whether a path is fair or unfair222A path could also be only partially fair — we omit this case for simplicity. requires careful ethical and sociological considerations and/or might not be possible from a dataset alone. Nevertheless, this example illustrates that we can view unfairness in a dataset as the presence of an unfair causal path from the sensitive attribute to or .
Different (un)fair path labeling requires to have different characteristics in order to be fair.
In the case in which the causal paths from to are all unfair (e.g. if is considered unfair), a that is statistically independent of (denoted with ) would not contain any of the unfair influence of on . In such a case, is said to satisfy demographic parity.
Demographic Parity (DP). satisfies demographic parity if , i.e. , where e.g.
can be estimated as
with if and (and zero otherwise), and where indicates the number of individuals with . Notice that many classifiers, rather than a binary prediction , output a degree of belief that the individual belongs to class 1, , also called score
. For example, in the case of logistic regression
corresponds to the probability of class 1,i.e. . To obtain the prediction from , it is common to use a threshold , i.e. . In this case, we can rewrite the estimate for as
Notice that implies for all values of .
In the case in which the causal paths from to are all fair (e.g. if is absent and is considered fair), a such that or would be allowed to contain such a fair influence, but the (dis)agreement between and would not be allowed to depend on . In these cases, is said to satisfy equal false positive/false negative rates and calibration respectively.
Equal False Positive and Negative Rates (EFPRs/EFNRs). satisfies EFPRs and EFNRs if , i.e. (EFPRs) and (EFNRs) .
satisfies calibration if . In the case of score output , this condition is often instead called predictive parity at threshold , , and calibration defined as requiring .
In the case in which at least one causal path from to is unfair (e.g. if is present), EFPRs/EFNRs and calibration are inappropriate criteria, as they would not require the unfair influence of on to be absent from (e.g. a perfect model () would automatically satisfy EFPRs/EFNRs and calibration, but would contain the unfair influence). This observation is particularly relevant to the recent debate surrounding the correctional offender management profiling for alternative sanctions (COMPAS) pretrial risk assessment tool. We revisit this debate in the next section.
2.1 The COMPAS Debate
Over the past few years, numerous state and local governments around the United States have sought to reform their pretrial court systems with the aim of reducing unprecedented levels of incarceration, and specifically the population of low-income defendants and racial minorities in America’s prisons and jails [2, 25, 31]. As part of this effort, quantitative tools for determining a person’s likelihood for reoffending or failure to appear, called risk assessment instruments (RAIs), were introduced to replace previous systems driven largely by opaque discretionary decisions and money bail [6, 26]. However, the expansion of pretrial RAIs has unearthed new concerns of racial discrimination which would nullify the purported benefits of these systems and adversely impact defendants’ civil liberties.
An intense ongoing debate, in which the research community has also been heavily involved, was triggered by an exposé from investigative journalists at ProPublica  on the COMPAS pretrial RAI developed by Equivant (formerly Northpointe) and deployed in Broward County in Florida.
The COMPAS general recidivism risk scale (GRRS) and violent recidivism risk scale (VRRS), the focus of ProPublica’s investigation, sought to leverage machine learning techniques to improve the predictive accuracy of recidivism compared to older RAIs such as the level of service inventory-revised  which were primarily based on theories and techniques from a sub-field of psychology known as the psychology of criminal conduct [4, 9]333While the exact methodology underlying GRRS and VRRS is proprietary, publicly available reports suggest that the process begins with a defendant being administered a 137 point assessment during intake. This is used to create a series of dynamic risk factor scales such as the criminal involvement scale and history of violence scale . In addition, COMPAS also includes static attributes such as the defendant’s age and prior police contact (number of prior arrests). The raw COMPAS scores are transformed into decile values by ranking and calibration with a normative group to ensure an equal proportion of scores within each scale value. Lastly, to aid practitioner interpretation, the scores are grouped into three risk categories. The scale values are displayed to court officials as either Low (1-4), Medium (5-7), and High (8-10) risk.
. In addition, COMPAS also includes static attributes such as the defendant’s age and prior police contact (number of prior arrests). The raw COMPAS scores are transformed into decile values by ranking and calibration with a normative group to ensure an equal proportion of scores within each scale value. Lastly, to aid practitioner interpretation, the scores are grouped into three risk categories. The scale values are displayed to court officials as either Low (1-4), Medium (5-7), and High (8-10) risk..
ProPublica’s criticism of COMPAS centered on two concerns. First, the authors argued that the distribution of the risk score
exhibited discriminatory patterns, as black defendants displayed a fairly uniform distribution across each value, while white defendants exhibited a right skewed distribution, suggesting that the COMPAS recidivism risk scores disproportionately rated white defendants as lower risk than black defendants. Second, the authors claimed that the GRRS and VRRS did not satisfy EFPRs and EFNRs, asand for black defendants, whilst and for white defendants (see Fig. 1). This evidence led ProPublica to conclude that COMPAS had a disparate impact on black defendants, leading to public outcry over potential biases in RAIs and machine learning writ large.
In response, Equivant published a technical report  refuting the claims of bias made by ProPublica and concluded that COMPAS is sufficiently calibrated, in the sense that it satisfies predictive parity at key thresholds. Subsequent analyses [13, 16, 30] confirmed Equivant’s claims of calibration, but also demonstrated the incompatibility of EFPRs/EFNRs and calibration due to differences in base rates across groups () (see Appendix B). Moreover, the studies suggested that attempting to satisfy these competing forms of fairness force unavoidable trade-offs between criminal justice reformers’ purported goals of racial equity and public safety.
As explained in Sect. 2, by requiring the rate of (dis)agreement between and to be the same for black and white defendants (and therefore by not being concerned with dependence of on ), EFPRs/EFNRs and calibration are inappropriate fairness criteria if dependence of on includes influence of on through an unfair causal path.
As previous research has shown [29, 35, 44], modern policing tactics center around targeting a small number of neighborhoods — often disproportionately populated by non-white and low income residents — with recurring patrols and stops. This uneven distribution of police attention, as well as other factors such as funding for pretrial services [31, 46], means that differences in base rates between racial groups are not reflective of ground truth rates. We can rephrase these findings as indicating the presence of a direct path (through unobserved neighborhood) in the CBN representing the data-generation mechanism (Fig. 2). Such tactics also imply an influence of on through the set of variables containing number of prior arrests. In addition, the influence of on through and could be more prominent or contain more unfairness due to racial discrimination.
These observations indicate that EFPRs/EFNRs and calibration are inappropriate criteria for this case (and therefore that their incompatibility is irrelevant), and more generally that the current fairness debate surrounding COMPAS gives insufficient consideration to the patterns of unfairness underlying the training data. Our analysis formalizes the concerns raised by social scientists and legal scholars on mismeasurement and unrepresentative data in the US criminal justice system. Multiple studies [22, 34, 37, 46] have argued that the core premise of RAIs, to assess the likelihood a defendant reoffends, is impossible to measure and that the empirical proxy used (e.g. arrest or conviction) introduces embedded biases and norms which render existing fairness tests unreliable.
This section used the CBN framework to describe at a high-level different patterns of unfairness that can underlie a dataset and to point out issues with current deployment of fairness definitions. In the remainder of the manuscript, we use this framework more extensively to further advance our analysis on fairness. Before doing that, we give some background on CBNs [18, 39, 40, 42, 45], assuming that all variables except are continuous.
3 Causal Bayesian Networks
A Bayesian network is a directed acyclic graph where nodes and edges represent random variables and statistical dependencies. Each node in the graph is associated with the conditional distribution , where is the set of parents of
. The joint distribution of all nodes,, is given by the product of all conditional distributions, i.e. (see Appendix A for more details on Bayesian networks).
When equipped with causal semantic, namely when representing the data-generation mechanism, Bayesian networks can be used to visually express causal relationships. More specifically, CBNs enable us to give a graphical definition of causes and causal effects: if there exists a directed path from to , then is a potential cause of . Directed paths are also called causal paths.
The causal effect of on can be seen as the information traveling from to through causal paths, or as the conditional distribution of given restricted to causal paths. This implies that, to compute the causal effect, we need to disregard the information that travels along non-causal paths, which occurs if such paths are open. Since paths with an arrow emerging from are either causal or closed (blocked) by a collider, the problematic paths are only those with an arrow pointing into , called back-door paths, which are open if they do not contain a collider.
An example of an open back-door path is given by in the CBN of Fig. 3(a): the variable is said to be a confounder for the effect of on , as it confounds the causal effect with non-causal information. To understand this, assume that represents hours of exercise in a week, cardiac health, and age: observing cardiac health conditioning on exercise level from does not enable us to understand the effect of exercise on cardiac health, since includes the dependence between and induced by age.
Each parent-child relationship in a CBN represents an autonomous mechanism, and therefore it is conceivable to change one such a relationship without changing the others. This enables us to express the causal effect of on as the conditional distribution on the modified CBN of Fig. 3(b), resulting from replacing with a Dirac delta distribution (thereby removing the link from to ) and leaving the remaining conditional distributions and unaltered — this process is called intervention on . The distribution can be estimated as . This is a special case of the following back-door adjustment formula.
Back-door Adjustment. If a set of variables satisfies the back-door criterion relative to , the causal effect of on is given by . satisfies the back-door criterion if (a) no node in is a descendant of and (b) blocks every back-door path from to .
The equality follows from the fact that , obtained by removing from all links emerging from , retains all (and only) the back-door paths from to . As blocks all such paths, in . This means that there is no non-causal information traveling from to when conditioning on and therefore conditioning on coincides with intervening.
Conditioning on to block an open back-door path may open a closed path on which is a collider. For example, in the CBN of Fig. 4(a), conditioning on closes the paths and , but opens the path (additional conditioning on would close ).
The back-door criterion can also be derived from the rules of do-calculus [39, 40], which indicate whether and how can be estimated using observations from : for many graph structures with unobserved confounders the only way to compute causal effects is by collecting observations directly from — in this case the effect is said to be non-identifiable.
Potential Outcome Viewpoint.
Let be the random variable with distribution . is called potential outcome and, when not ambiguous, we will refer to it with the shorthand . The relation between and all the variables in other than can be expressed by the graph obtained by removing from all the links emerging from , and by replacing with . If is independent on in this graph, then444The equality is called consistency. . If is independent of in this graph when conditioning on , then
i.e. we retrieve the back-door adjustment formula.
In the remainder of the section we show that, by performing different interventions on along different causal paths, it is possible to isolate the contribution of the causal effect of on along a group of paths.
Direct and Indirect Effect
Consider the CBN of Fig. 4(b), containing the direct path and one indirect causal path through the variable . Let be the random variable with distribution equal to the conditional distribution of given restricted to causal paths, with along and along . The average direct effect (ADE) of with respect to , defined as
where e.g. , measures the difference in flow of causal information from to between the case in which along and along and the case in which along both paths.
Analogously, the average indirect effect (AIE) of with respect to , is defined as .
To estimate the effect along a specific group of causal paths, we can generalize the formulas for the ADE and AIE by replacing the variable in the first term with the one resulting from performing the intervention along the group of interest and along the remaining causal paths. For example, consider the CBN of Fig. 5 (top) and assume that we are interested in isolating the effect of on along the direct path and the paths passing through , , namely along the red links. The path-specific effect (PSE) of with respect to for this group of paths is defined as
where is given by
In the simple case in which the CBN corresponds to a linear model, e.g.
where , , and are unobserved independent zero-mean Gaussian variables, we can compute by expressing as a function of and the Gaussian variables, by recursive substitutions in and , i.e.
and then take the mean, obtaining . Analogously
For and , this gives
The same conclusion could have been obtained by looking at the graph annotated with path coefficients (Fig. 5 (bottom)). The PSE is obtained by summing over the three causal paths of interest (, , and ) the product of all coefficients in each path.
Notice that , given by
coincides with , given by
Effect of Treatment on Treated.
Consider the conditional distribution . This distribution measures the information travelling from to along all open paths, when is set to along causal paths and to along non-causal paths. The effect of treatment on treated (ETT) of with respect to is defined as . As the PSE, the ETT measures difference in flow of information from to when takes different values along different paths. However, the PSE considers only causal paths and different values for along different causal paths, whilst the ETT considers all open paths and different values for along causal and non-causal paths respectively. Similarly to , for the CBN of Fig. 4(b) can be expressed as
Notice that, if we define difference in flow of non-causal (along the open back-door paths) information from to when with respect to when as , we obtain
4 Fairness Considerations using CBNs
Equipped with the background on CBNs from Sect. 3, in this section we further investigate unfairness in a dataset , discuss issues that might arise when building a decision system from it, and show how to measure and deal with unfairness in complex scenarios, revisiting and extending material from [11, 33, 48].
4.1 Back-door Paths from to
In Sect. 2 we have introduced a graphical interpretation of unfairness in a dataset as the presence of an unfair causal path from to or . More specifically, we have shown through a college admission example that unfairness can be due to an unfair link emerging (a) from or (b) from a subsequent variable in a causal path from to (e.g. in the example). Our discussion did not mention paths from to with an arrow pointing into , namely back-door paths. This is because such paths are not problematic.
To understand this, consider the hiring scenario described by the CBN on the left, where represents religious belief and educational background of the applicant, which influences religious participation (). Whilst due to the open back-door path from to , the hiring decision is only based on .
4.2 Opening Closed Unfair Paths from to
In Sect. 2, we have seen that, in order to reason about fairness of , it is necessary to question and understand unfairness in . In this section, we warn that another crucial element needs to be considered in the fairness discussion around , namely
The variables used to form could project into unfair patterns in that do not concern .
This could happen, for example, if a closed unfair path from to is opened when conditioning on the variables used to form .
As an example, assume the CBN in Fig. 6 representing the data-generation mechanism underlying a music degree scenario, where corresponds to gender, to music aptitude (unobserved, i.e. ), to the score obtained from an ability test taken at the beginning of the degree, and to the score obtained from an ability test taken at the end of the degree. Individuals with higher music aptitude are more likely to obtain higher initial and final scores (, ). Due to discrimination occurring at the initial testing, women are assigned a lower initial score than men for the same aptitude level (). The only path from to , , is closed as is a collider on this path. Therefore the unfair influence of on does not reach (). Nevertheless, as , a prediction based on the initial score only would contain the unfair influence of on . For example, assume the following linear model: , with and . A linear predictor of the form minimizing would have parameters , giving , i.e. . Therefore, this predictor would be using the sensitive attribute to form a decision, although implicitly rather than explicitly. Instead, a predictor explicitly using the sensitive attribute, , would have parameters
i.e. . Therefore, this predictor would be fair. From the CBN we can see that the explicit use of can be of help in retrieving . Indeed, since , using in addition to can give information about . In general (e.g. in a non-linear setting) it is not guaranteed that using would ensure . Nevertheless, this example shows how explicit use of the sensitive attribute in a model can ensure fairness rather than leading to unfairness.
This observation is relevant to one of the simplest fairness definitions, motivated by legal requirements, called fairness through unawareness, which states that is fair as long as it does not make explicit use of the sensitive attribute . Whilst this fairness criterion is often indicated as problematic because some of the variables used to form could be a proxy for (such as neighborhood for race), the example above shows a more subtle issue with it.
4.3 Path-Specific Population-level Unfairness
In this section, we show that the path-specific effect introduced in Sect. 3 can be used to quantify unfairness in in complex scenarios.
Consider the college admission example discussed in Sect. 2 (Fig. 7). In the case in which the path , and therefore , is considered unfair, unfairness overall population can be quantified with (coinciding with ) where, for example, and indicate female and male applicants respectively.
In the more complex case in which the path is considered fair, unfairness can instead be quantified with the path-specific effect along the direct path , , given by
Notice that computing requires knowledge of the CBN. If the CBN structure is not known or estimating its conditional distributions is challenging, the resulting estimate could be imprecise.
4.4 Path-Specific Individual-level Unfairness
In the college admission example of Fig. 7 in which the path is considered fair, rather than measuring unfairness overall population, we might want to know e.g. whether a rejected female applicant was treated unfairly. We can answer this question by estimating whether the applicant would have been admitted had she been male () along the direct path from (notice that the outcome in the actual world, , corresponds to ).
To understand how this can be achieved, consider the following linear model associated to a CBN with the same structure as the one in Fig. 7
where and are unobserved independent zero-mean Gaussian variables.
The relationships between and in this model can be inferred from the twin Bayesian network  on the left resulting from the intervention along and along : in addition to and , the network contains the variables , and corresponding to the counterfactual world in which along , with , and . The two groups of variables are connected through , indicating that the factual and counterfactual worlds share the same unobserved randomness. From this network, we can deduce that 666Notice that , but