Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality

03/27/2019 ∙ by Aria Khademi, et al. ∙ Purdue University Penn State University 0

As virtually all aspects of our lives are increasingly impacted by algorithmic decision making systems, it is incumbent upon us as a society to ensure such systems do not become instruments of unfair discrimination on the basis of gender, race, ethnicity, religion, etc. We consider the problem of determining whether the decisions made by such systems are discriminatory, through the lens of causal models. We introduce two definitions of group fairness grounded in causality: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT). We use the Rubin-Neyman potential outcomes framework for the analysis of cause-effect relationships to robustly estimate FACE and FACT. We demonstrate the effectiveness of our proposed approach on synthetic data. Our analyses of two real-world data sets, the Adult income data set from the UCI repository (with gender as the protected attribute), and the NYC Stop and Frisk data set (with race as the protected attribute), show that the evidence of discrimination obtained by FACE and FACT, or lack thereof, is often in agreement with the findings from other studies. We further show that FACT, being somewhat more nuanced compared to FACE, can yield findings of discrimination that differ from those obtained using FACE.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the growing adoption of algorithmic decision making systems, e.g., AI and machine learning systems, across many real-world decision making scenarios on the Web and elsewhere, there is a pressing need to make sure that such systems do not become vehicles of unfair discrimination, inequality, and social injustice

(Barocas and Selbst, 2016; Barocas et al., 2017). Of particular interest in this context is the task of detecting and preventing discrimination or unfair treatment of individuals or groups on the basis of gender, race, religion, etc. Such discrimination is traditionally addressed using one of two legal frameworks: disparate treatment (which aims to enforce procedural fairness, namely, the equality of treatment that prohibits the use of the protected attribute in the decision process); and disparate impact (Barocas and Selbst, 2016) (which aims to guarantee outcome fairness, namely, the equality of outcomes between protected groups relative to other groups). It is clear that enforcing procedural fairness within the disparate treatment framework does not guarantee non-discrimination within the disparate impact framework.

There is growing interest in algorithmic decision making systems that are demonstrably fair (see (Berk et al., 2017) for a review). Much of this literature relies on precise definitions that quantify fairness to avoid discrimination with respect to protected attributes, e.g., race, gender, on the basis of the legal notions of disparate treatment or disparate impact (Barocas and Selbst, 2016) (see (Romei and Ruggieri, 2014; Zliobaite, 2015; Barocas and Selbst, 2016; Berk et al., 2017; Loftus et al., 2018) for reviews). Some examples include: fairness through unawareness (Grgic-Hlaca et al., 2016), individual fairness (Dwork et al., 2012)

, equalized odds

(Hardt et al., 2016; Zafar et al., 2017a), calibration (Chouldechova, 2017), demographic (or statistical) parity (Calders et al., 2009; Kamishima et al., 2012; Kamiran and Calders, 2009; Johndrow and Lum, 2017), the 80% rule (disparate impact) (Feldman et al., 2015; Zafar et al., 2017b), representational fairness (Zemel et al., 2013; Louizos et al., 2015), and fairness under composition (Dwork and Ilvento, 2018).

Unfortunately, choosing the appropriate definition of fairness in a given context is extremely challenging due to a number of reasons. First, depending on the relationship between a protected attribute and data, enforcing certain definitions of fairness can actually increase discrimination (Kusner et al., 2017). Second, different definitions of fairness can be impossible to satisfy simultaneously (Kleinberg et al., 2016; Berk et al., 2017; Chouldechova, 2017). Many of these difficulties can be attributed to the fact that fairness criteria are based solely

on the joint probability distribution of the random variables of interest, namely,

(predicted outcome), (actual outcome), (features), and (sensitive attributes). (Hardt et al., 2016) recently showed any such definition for fairness of a predictor that depends merely on the joint probability distribution is not necessarily capable of detecting discrimination. Hence, it is tempting to approach the problem of fairness through the lens of causality (Barabas et al., 2018).

Answering questions of fairness through the lens of causality entails replacing the question “Is the decision discriminatory with respect to a protected attribute?” by: “Does the protected attribute have a causal effect on the decision?” A practical difficulty in using this approach is that, in general, establishing a causal relationship between a protected attribute and a decision requires the results of experimental manipulation of the protected attribute. Fortunately, however, existing frameworks for determining causal effects from observational data (Pearl, 2009; Imbens and Rubin, 2015) provide a rich set of theoretical results as well as practical tools for elucidating causal effects, and specifically, answering questions about counterfactuals or potential outcomes, i.e., results of hypothetical experimental interventions from observational data, whenever it is possible to do so. Hence, there is a growing body of work (see (Loftus et al., 2018)

for a recent review) focused on explicitly causal (as opposed to purely joint distribution based or

observational) definitions for fairness (e.g., (Kusner et al., 2017, 2018; Zhang and Bareinboim, 2018; Nabi and Shpitser, 2018; Kilbertus et al., 2017; Chiappa and Gillam, 2018; Zhang et al., 2017; Bonchi et al., 2017; Li et al., 2017; Zhang et al., 2016; Russell et al., 2017; VanderWeele and Robinson, 2014)). While some, e.g., (Zhang et al., 2017), have focused on testing fairness (or conversely, determining whether there is discrimination), others, e.g., (Kilbertus et al., 2017; Kusner et al., 2017) have sought to design machine learning algorithms that yield predictive models that are demonstrably fair. However, most of the existing work on defining fairness in causal terms has focused on variants of individual fairness. Against this background, we focus on robust methods for detecting and quantifying discrimination against protected groups, which is a necessary prerequisite for developing predictive models that are provably non-discriminatory.


We reduce the problem of quantifying discrimination against protected groups to the well-studied problem of estimating the causal effect of some variable(s) on a target (outcome) variable. We introduce two explicitly causal definition of fairness in a population, fair on average causal effect (FACE), and in a protected group, fair on average causal effect on the treated (FACT), both with respect to a protected attribute (e.g., gender, race). We use the Rubin-Neyman potential outcomes framework (Rubin, 1974, 2005; Imbens and Rubin, 2015) for robust estimation of FACE and FACT. We demonstrate the effectiveness of the proposed approach in detecting and quantifying group fairness using synthetic data, as well as two real-world data sets: the Adult income data from the UCI repository (Dheeru and Karra Taniskidou, 2017) (with gender being the protected attribute), and the NYC Stop and Frisk data (with race being the protected attribute). We show that the evidence of discrimination, or lack thereof, obtained by FACE and FACT is often in agreement with other studies. We further show that FACT, being somewhat more nuanced compared to FACE, can yield findings of discrimination that differ from those obtained using FACE.

2. Fairness: A Causal Perspective

Assume we have observational data on a population of individuals. Let

be the vector of non-protected attributes,

be a binary protected attribute, and an outcome of interest. The question we want to answer is: Are individuals being discriminated against, on average, with respect to outcomes or decisions on the basis of a protected attribute ? From a causal perspective, such a question is equivalent to the following question: Does have a causal effect on ? In other words, how much would change, on average, were the value of to change? Both Structural Causal Models (Pearl, 2009) and the Rubin-Neyman Causal Model (RCM) (Imbens and Rubin, 2015) (also called the potential outcomes model) offer methods for estimating such causal effects from observational data.

We introduce two explicitly causal definitions for fairness “on average” in a population or a protected group (as opposed to causal definitions of individual fairness, e.g., counterfactual fairness (Kusner et al., 2017)) with respect to a protected attribute (e.g., gender, race). Let and be the potential outcomes of a data point had their value of been and , respectively. Let be a decision function (or a predictive model trained using machine learning) that is used to support decision making. is the expectation of a random variable. We define the following.

Definition 2.1 ().

(FACE: Fair on Average Causal Effect). A decision function is said to be fair, on average over all individuals in the population, with respect to , if .

Definition 2.2 ().

(FACT: Fair on Average Causal Effect on the Treated). A decision function is said to be fair with respect to , on average over individuals with the same value of , if .


Imagine we are given the hiring data of a company containing demographic information about applicants, as well as {male, female} as their gender, and {hired, rejected} as whether they were hired by the company. Our task is to determine whether the company’s hiring decisions are fair on average with respect to gender. FACE contrasts the expected outcomes (i.e., hiring) between men vs. women with the expectation taken over the entire population. FACT contrasts the expected outcomes observed for a specific protected group (e.g., women) and the hypothetical (counterfactually inferred) outcomes for the group had they not been members of the protected group (with the expectation taken only over the members of the protected group), e.g., hiring outcomes for women contrasted with outcomes for the same individuals had their gender been different with all other attributes remaining unchanged. Obviously, such counterfactual outcomes cannot be obtained from observational data111This is called the Fundamental Problem of Causal Inference (FPCI) from observational data (Holland, 1986). and ought to be estimated.

3. Estimating FACE and FACT

We use tools offered by the potential outcomes framework (Imbens and Rubin, 2015) to estimate FACE and FACT. These tools rely on the following key assumptions: i) Consistency which requires that for a data point , the potential outcome of under any level of treatment , i.e., , equals the actual outcome observed for that data point, , had they been exposed to treatment . Formally, under consistency, would hold for all . This assumption, used in existing literature (Nabi and Shpitser, 2018; Chiappa and Gillam, 2018; Pearl, 2019; Madras et al., 2019), is a rather natural one to make in our setting. ii) Positivity which asserts that the probability for all values of . In our setting, this means each value of the protected attribute has a non-zero probability. iii) Stable Unit Treatment Value Assumption (SUTVA) (Rubin, 1980) which consists of two sub-assumptions: 1) Absence of interference between individuals (Cox, 1958), which means that an individual’s potential outcome is unaffected by the treatment assigned to any other individual. While this assumption is plausible in our setting, it may be violated in some settings, in which case, such violations should be accounted for (Hernan and Robins, 2018). 2) Presence of only one form of treatment (and control). For example, if a treatment involves administering a drug, then all individuals who take the drug, take it in the same form (e.g., injection). This assumption is trivially satisfied in our setting because treatment is simulated by the protected attribute. iv) Unconfoundedness of the treatment mechanism which implies that given a set of observables, the potential outcomes of each individual are jointly independent of the corresponding treatment (Rubin, 1978). Unconfoundedness cannot be verified or contradicted entirely on the basis of observational data. However, sensitivity analysis (Rosenbaum, 2005; Liu et al., 2013) can be a useful tool for analyzing the estimated causal effects under violations of the unconfoundedness assumption. Strong ignorability refers to the combination of unconfoundedness and positivity (Rosenbaum and Rubin, 1983). Strong ignorability is a sufficient condition for the causal effect to be identifiable (Hernan and Robins, 2018) and is equivalent to the back-door criterion (Pearl, 2010), which is required for identifiability of the causal effects in Pearl’s model of causality (Pearl, 2010). In our work, as in the case of existing work on causal definitions of fairness (Nabi and Shpitser, 2018), we assume strong ignorability.

3.1. Estimating and Interpreting FACE

We use Inverse Probability Weighting (IPW), also known as Inverse Probability of Treatment Weighting (IPTW) in Marginal Structural Models (MSM) (Robins et al., 2000) to estimate FACE. Specifically, for each individual , we calculate a stabilized weight: (call it the weight model). We obtained stabilized weights using the R package ipw (version 1.0-11) (van der Wal et al., 2011). Assigning such a weight to every data point, we generate a “pseudo-population” in which there are copies of each data point . Subsequently, the associative parameter in the weighted regression (call it the outcome model) of the (continuous) outcome on the protected attribute : , would be the causal effect of on . For a binary output

, we use the weighted logistic regression model:

In the absence of unmeasured confounders, if either the weight model or the outcome model are correctly specified, then

is an unbiased estimator of the average causal effect

(Robins et al., 2000). For example, suppose is salary and is gender. At the chosen level of statistical significance , implies that salary is fair with respect to gender on average over the entire population of individuals; implies that, on average, women’s salary differs from that of men by a factor of (across the entire population). For a continuous outcome , is simply the average causal effect of on . For a binary outcome , corresponds to the causal odds ratio of salary for women versus men.

3.2. Estimating and Interpreting FACT

We use matching to estimate FACT. Consider the example of salary discrimination based on gender. For a woman, we can never observe what the salary would have been, had she been a man (i.e., her counterfactual salary). Hence, we estimate the counterfactual salary as follows (Imbens and Rubin, 2015): 1) Using a suitable matching technique (see Section Matching Methods), we match the woman , to a man who is closest to with respect to a distance measure . 2) The matching process is repeated as needed until matches are of acceptable quality (see Section Quality of Matches). 3) After matching, we use the salary of the matched man (i.e., ), as the counterfactual salary of the woman .

Matching Methods

The results of matching depend on the choice of distance measure as well as the matching process. Several matching methods exist (see (Stuart, 2010) for a survey). In what follows, for simplicity and brevity, we refer to individuals with protected attribute set to as the treated individuals and those with the protected attribute set to as the controlled individuals. We used the matching methods implemented within the R package MatchIt (version 3.0.2) (Ho et al., 2011) with all parameters set to their default values unless otherwise noted: (i) Exact Matching (EM); (ii) Nearest Neighbor Matching (NNM) with propensity score (Rosenbaum and Rubin, 1983). Following (Rubin, 2001)

, we estimated the propensity scores using the logit link and transformed them to the linear scale. Then, we ran NNM with replacement, based on the linear propensity scores, and discarded the data points (both from treated and controlled) that fall outside the support of the distance measure; (iii) Nearest Neighbor Matching with a Propensity Caliper (NNMPC). NNMPC includes only matches within a certain number of standard deviations of the distance measure and discards the rest. In NNMPC, we use the same procedure as in NNM, augmented with a caliper = 0.25

(Rosenbaum and Rubin, 1985), resulting in the matches outside 0.25 times the standard deviation of the (transformed) linear propensity score, being discarded; (iv) Mahalanobis Metric Matching within the Propensity Caliper (Rubin, 2001) (MMMPC). MMMPC determines for each data point, a “donor pool” of available matches within the propensity caliper. Mahalanobis metric matching is then performed among the data points chosen in the previous step mimicking blocking in randomized experiments (Rubin, 2001). We ran MMMPC with caliper, replacement, and discarding strategy as described above in NNM; and (v) Full Matching (FM) (Rosenbaum, 1991). We used the same distance measure and discarding strategy as described above in NNM.

Quality of Matches

To ensure accurate estimation of FACT, it is crucial to measure the “goodness-of-match.” If the data points are well matched, then one can proceed to estimate FACT. Common diagnostics for examining the quality of match include both numerical and graphical criteria. Among the numerical criteria, following (Rubin, 2001), we compare the standardized difference in the means of the treated and the controlled data points in terms of the distance measure. We denote the absolute value of this difference in means on the original, and matched data, by , and , respectively. For the match to be of good quality, has to be close to

. Among the graphical criteria, we use quantile-quantile (QQ), and jitter plots recommended by

(Stuart, 2010; Ho et al., 2011).222We avoid the commonly used hypothesis tests for assessing feature balance in diagnosing the quality of matches because such tests have been shown to be misleading in general (Imai et al., 2008).

Outcome Analysis After Matching

With good quality matched pairs identified, we can proceed to conduct outcome analysis for FACT estimation. Matching methods often assign appropriate weights to the matched data points to balance the treated and controlled data distributions. After obtaining the weights via matching, we run the following weighted regression models: , for continuous, and , for binary outcomes, both on the matched data set, to estimate FACT. The estimated coefficient for in the equations above, i.e., , estimates FACT. The resulting estimate is “doubly robust” in that if either the matching model, or the outcome model, are correctly specified, would be statistically consistent (Ho et al., 2011).

Interpreting as a Measure of FACT

Suppose is salary and is gender. At the chosen level of statistical significance , implies that there is no significant difference in expected salary for women compared to what their salary would have been had they been men (with all non-protected attributes remaining unchanged, a condition that is approximated by counterfactual inference using matching), thus implying no gender-based discrimination in salary for women; implies that, on average, women’s salary is statistically significantly different from what it would have been, had they been men, thus implying gender-based discrimination in salary. For a continuous outcome , e.g., the salary in US dollars, if statistically significant, means that on average, considering men and women that are matched based on their feature vector , the difference between women’s salary and that of men is

. For a binary outcome, e.g., salaries binarized with an arbitrary threshold

, is the causal odds ratio of women’s salary compared to that of men, for those women and men who are similar.

Impact of Unmeasured Confounders on

What if the strong ignorability assumption (i.e., no hidden confounders) is violated? In the absence of unmeasured confounding, matching estimators are unbiased if the matching model is specified correctly, i.e., if balance is achieved over the observed attributes. However, it is conceivable that the results of matching could change in the presence of unobserved confounders (i.e., hidden bias). We perform sensitivity analysis (Rosenbaum, 2005; Liu et al., 2013) to investigate the degree to which the unmeasured confounders impact . Let be the odds ratio of matched (using any matching method) data points and receiving a treatment. Sensitivity analysis proceeds by first assuming (i.e., no hidden bias). Then, it increases (e.g., ), thus mimicking the presence of hidden bias, and examines the resulting changes to statistical significance of . The at which the significance of the upper bound for the p-value would change (e.g., from to ) is the point at which is no longer robust to hidden bias. We ran sensitivity analysis using the R package rbounds (version 2.1) (Keele, 2010).

4. Experiments and Results

We tested our approach on a synthetic data set (where the discrimination based on a protected attribute can be varied in a controlled fashion), and two real-world data sets that have been previously used in studies of fairness. In each case, we designated a protected attribute and estimated FACE and FACT as measures of discrimination based on that attribute. We run all of our statistical significance tests with . We proceed to describe the data sets, experiments, as well as our FACE and FACT analyses in detail.

4.1. Data sets

Synthetic data set

We generated 1000 data points, each with a feature vector , a protected attribute , and an outcome variable according to the following: ; ; , where is a weight vector (fixed for all data points) with each element drawn randomly in [0,1]. The resulting generative model ensures there are no hidden confounders and there is no discrimination, as measured by FACE and FACT, with respect to the outcome variable on the basis of the protected attribute .

The Adult data set

The Adult income data set (Kohavi, 1996)333https://archive.ics.uci.edu/ml/datasets/adult, contains information about individuals as well as their salaries. The data set includes 48842 individuals each with 14 attributes, 6 continuous and 8 categorical, including demographic and work-related information such as age, gender, hours of work per week, etc. We examined whether there is gender-based discrimination in salaries by designating gender

as the sensitive attribute. We encoded categorical variables using one-hot-encoding and removed data records with missing values, yielding a data set with 46033 individuals and 45 features (excluding gender, the protected attribute). We designated the outcome

to be a binary variable denoting whether the person’s annual salary is

(Y=1), or (Y= 0).

The NYC Stop and Frisk (NYCSF) data set

We retrieved the publicly available stop, search, and frisk data from The New York Police Department (NYPD)444https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page website which serves demographic and other information about drivers stopped by the NYC police force. Our question is whether the arrests made after stops have been discriminatory with respect to race. Following (Kusner et al., 2017), we restricted our experiment to the year 2014 yielding a total of 45787 records. We selected the subset of records corresponding to only Black-Hispanic and White men. We designated race as the protected attribute with denoting Black-Hispanic and denoting White. We dropped the data records with missing values and encoded categorical variables with one-hot-encoding. The resulting data consist of 7593 records each with 73 features (excluding race, the sensitive attribute). The outcome denotes whether an arrest was made (), or not ().

4.2. FACE Check: Fairness Analysis Using FACE

We report our analysis of fairness using FACE, for the synthetic, Adult, and NYCSF data sets. The estimated FACE () are shown in Table 1

. In all cases, the null hypothesis is

. In the case of synthetic data, we find insufficient evidence to reject , suggesting the outcome is fair with respect to the protected attribute (an expected conclusion given the design of the generative model in Section 4.1, which ensures that the outcome is fair with respect to the protected attribute). In the case of Adult data, we reject and find that , the average causal effect of gender on salaries, is . This means that on average, over the entire population, the odds of women having a salary a year is times that of men, suggesting gender-based discrimination against women as measured by FACE. This finding is in agreement with the conclusions reported in (Nabi and Shpitser, 2018; Li et al., 2017). In the case of NYCSF data, we reject and find that is which means that on average, the odds of Black-Hispanics being arrested after a stop by the police, is times that of Whites, suggesting possible racial bias against non-Whites.

Data set Standard Error P-value
Table 1. Estimates of FACE () obtained on the synthetic, Adult, and NYCSF data sets.

4.3. FACT Check: Fairness Analysis Using FACT

We report our analysis of fairness using FACT, for the synthetic, Adult, and NYCSF data sets.

Matching Quality Analyses

Because the quality of matched pairs used to estimate FACT impacts the conclusions that can be drawn using it, we compare the FACT estimates obtained using several widely-used matching methods described in Section 3.2. We present some analyses to verify that the generated matches are of sufficiently high quality for estimating FACT.

We observe that before matching, is 1.6400, 3.3508, and 1.1616, on the synthetic, Adult, and NYCSF data sets, respectively. The matching methods dramatically reduced on all of the data sets (see Table 2). Overall, NNM and FM achieved the lowest as compared to other matching methods on all data sets. The greater the number of pairs that are matched, the harder it is to achieve balance, and the trade-off between the two can be application dependent. We observed that considering the trade-off between the number of matches and , FM yields higher quality matches on all data sets as compared to other methods.

The QQ plots are generated for each feature in each data set. In Figure 1 we show the QQ plots before and after FM for the first three features of the synthetic data set. The features lie far away from the 45 degree line before FM. After FM, the features are much better aligned to the diagonal line showing a more desirable feature balance. We also show the jitter plots of FM on all data sets in Figure 2. It is clear that the distribution of propensity scores of the treated and controlled data points are very similar to each other after matching. Having verified that the results of matching are of adequate quality, we proceed to use them for estimating FACT.

Figure 1. QQ plots of the first three features from the synthetic data set before (left) and after (right) FM.
(a) Synthetic data set.
(b) NYCSF data set.
(c) Adult data set.
Figure 2. Jitter plots of distribution of the propensity scores on the linear logit scale after FM on the synthetic (left), NYCSF (middle), and Adult (right) data sets. Each circle represents a data point. Area of the circle is proportional to the weight given to the data point. Female, Black-Hispanic = treated, and male, White = controlled.
Synthetic data set
Matching Method # of Treated Matches # of Control Matches Standard Error P-value
Adult data set
Matching Method # of Treated Matches # of Control Matches Standard Error P-value
NYCSF data set
Matching Method # of Treated Matches # of Control Matches Standard Error P-value
Table 2. Estimates of FACT () obtained via various matching methods on the synthetic, NYCSF, and Adult data sets.

FACT Estimates

The results of FACT analyses on the synthetic, Adult, and NYCSF data sets are summarized in Table 2 (Note that EM did not yield any matches and hence is omitted from Table 2). In all cases, the null hypothesis is . In the case of synthetic data, FACT analyses show that for NNM and MMMPC, there is not enough evidence to reject . The p-values in the case of NNMPC and FM are , but the magnitude of the estimated is close to zero. We conclude that the synthetic data set is fair on average with respect to FACT. On the Adult data, we can reject , suggesting that salaries of women are significantly lower than those of men who match them on the non-protected attributes. For example, using FM, we find that , thus the odds of women earning a year, is times that of men. We conclude that in the Adult data, there is evidence of gender-based discrimination in salary, on average, against women. On the NYCSF data, interestingly, FACT analyses show that cannot be rejected, suggesting a lack of evidence for racial bias, on average, in arrests after stops (when Black-Hispanics are compared with Whites who match them on non-protected attributes). This conclusion contradicts the finding of racial bias based on counterfactual fairness analysis (Supplementary Material S6 in (Kusner et al., 2017)) which suggests discrimination against individuals, as well as FACE analysis (see Section 4.2). We conjecture that the apparent discrepancy can be explained by noting that (i) fairness (or discrimination) on average does not necessarily imply individual-level fairness (or individual-level discrimination), and (ii) FACT compares the observed outcomes of members of a protected group with the hypothetical (counterfactual) outcomes they would have experienced had they not been members of the protected group (with all non-protected attributes remaining unchanged), whereas FACE compares such counterfactual outcomes on the entire population.

Impact of Unmeasured Confounders

We ran sensitivity analysis of our estimates of FACT for (where larger values of correspond to greater bias introduced by hidden confounders) on the Adult and NYCSF data sets. We find that all of our estimates obtained with various matching methods are quite robust to hidden confounder bias. Specifically, on the Adult data set, for all matching methods except FM, the estimates are robust to such bias, and for FM, they are robust up to , which corresponds to a fairly large amount of bias. On the NYCSF data set, estimates obtained via NNM and MMMPC are robust to hidden confounder bias, and NNMPC and FM are robust up to equals 8.5, and 3, respectively. These results mean that our FACT estimates (and hence our findings of discrimination on the basis of protected attributes, or lack thereof) are fairly robust to hidden confounder bias.

5. Summary and Discussion

We have approached the problem of detecting whether a group of individuals that share a sensitive attribute, e.g., race, gender, have been subjected to discrimination in an algorithmic decision-making system, through the lens of causality. We have introduced two explicitly causal definitions of group fairness: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT). We have shown how to robustly estimate FACE and FACT, and use the resulting estimates to detect and quantify discrimination based on specific attributes (e.g., gender, race). The results of our experiments on synthetic data show that our proposed methods are effective at detecting and quantifying group fairness. Our analyses of the Adult data set for evidence of gender-based discrimination in salary, and of the NYCSF data set for evidence of racial bias in arrests after traffic stops, yield evidence of discrimination, or lack thereof, that is often in agreement with other studies.555The regression and matching-based methods we employed to estimate FACE and FACT adjust for covariates that might be potential confounders of the protected attribute, which although necessary in general, may be unnecessary in the case of gender and race, because they are unlikely to be caused by any other covariate. Consequently, the reported estimates of FACE and FACT are likely to represent direct causal effects as opposed to total causal effects. We show on the real-world data that our estimates of FACE and FACT are robust to unmeasured confounding. Our results further show on the real-world data that FACE and FACT based findings do not always agree. Our FACT analyses also demonstrate that group-fairness (or discrimination) does not necessarily imply individual-level fairness (or individual-level discrimination).

Some directions for further research include: relaxing the assumption that the data are independent and identically distributed (i.i.d.) in settings where individuals are related to each other through family ties or other relationships; examining the relationships between different causal notions of fairness; and designing automated decision support systems that are demonstrably non-discriminatory with respect to given outcome(s) and protected attribute(s).


This work was funded in part by grants from the NIH NCATS through the grant UL1 TR000127 and TR002014 and by the NSF through the grants 1518732, 1640834, and 1636795, the Edward Frymoyer Endowed Professorship in Information Sciences and Technology at Pennsylvania State University and the Sudha Murty Distinguished Visiting Chair in Neurocomputing and Data Science funded by the Pratiksha Trust at the Indian Institute of Science (both held by Vasant Honavar). The content is solely the responsibility of the authors and does not necessarily represent the official views of the sponsors.


  • (1)
  • Barabas et al. (2018) C. Barabas, M. Virza, K. Dinakar, J. Ito, and J. Zittrain. 2018. Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment. In Conference on Fairness, Accountability and Transparency. 62–76.
  • Barocas et al. (2017) S. Barocas, E. Bradley, V. Honavar, and F. Provost. 2017. Big Data, Data Science, and Civil Rights. arXiv preprint arXiv:1706.03102 (2017).
  • Barocas and Selbst (2016) S. Barocas and A. D. Selbst. 2016. Big data’s disparate impact. Cal. L. Rev. 104 (2016), 671.
  • Berk et al. (2017) R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth. 2017. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207 (2017).
  • Bonchi et al. (2017) F. Bonchi, S. Hajian, B. Mishra, and D. Ramazzotti. 2017. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics 3, 1 (2017), 1–21.
  • Calders et al. (2009) T. Calders, F. Kamiran, and M. Pechenizkiy. 2009.

    Building classifiers with independency constraints. In

    Data mining workshops, 2009. ICDMW’09. IEEE international conference on. IEEE, 13–18.
  • Chiappa and Gillam (2018) S. Chiappa and T. PS. Gillam. 2018. Path-specific counterfactual fairness. arXiv preprint arXiv:1802.08139 (2018).
  • Chouldechova (2017) A. Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.
  • Cox (1958) D. R. Cox. 1958. Planning of experiments. (1958).
  • Dheeru and Karra Taniskidou (2017) D. Dheeru and E. Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
  • Dwork et al. (2012) C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 214–226.
  • Dwork and Ilvento (2018) C. Dwork and C. Ilvento. 2018. Fairness Under Composition. arXiv preprint arXiv:1806.06122 (2018).
  • Feldman et al. (2015) M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259–268.
  • Grgic-Hlaca et al. (2016) N. Grgic-Hlaca, M. B. Zafar, K. P. Gummadi, and A. Weller. 2016.

    The case for process fairness in learning: Feature selection for fair decision making. In

    NIPS Symposium on Machine Learning and the Law, Vol. 1. 2.
  • Hardt et al. (2016) M. Hardt, E. Price, and N. Srebro. 2016.

    Equality of opportunity in supervised learning. In

    Advances in Neural Information Processing Systems. 3315–3323.
  • Hernan and Robins (2018) M. A. Hernan and J. M. Robins. 2018. Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming.
  • Ho et al. (2011) D. E. Ho, K. Imai, G. King, and E. A. Stuart. 2011. MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software 42, 8 (2011), 1–28.
  • Holland (1986) P. W. Holland. 1986. Statistics and Causal Inference. J. Amer. Statist. Assoc. 81, 396 (1986), 945–960.
  • Imai et al. (2008) K. Imai, G. King, and E. Stuart. 2008. Misunderstandings Among Experimentalists and Observationalists about Causal Inference. Journal of the Royal Statistical Society, Series A 171, part 2 (2008), 481–502.
  • Imbens and Rubin (2015) G. W. Imbens and D. B. Rubin. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
  • Johndrow and Lum (2017) J. E. Johndrow and K. Lum. 2017. An algorithm for removing sensitive information: application to race-independent recidivism prediction. arXiv preprint arXiv:1703.04957 (2017).
  • Kamiran and Calders (2009) F. Kamiran and T. Calders. 2009. Classifying without discriminating. In Computer, Control and Communication. IC4 2009. 2nd International Conference on. IEEE, 1–6.
  • Kamishima et al. (2012) T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 35–50.
  • Keele (2010) L. Keele. 2010. An overview of rbounds: An R package for Rosenbaum bounds sensitivity analysis with matched data. White Paper. Columbus, OH (2010), 1–15.
  • Kilbertus et al. (2017) N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems. 656–666.
  • Kleinberg et al. (2016) J. Kleinberg, S. Mullainathan, and M. Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).
  • Kohavi (1996) R. Kohavi. 1996.

    Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. In

    Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Vol. 96. 202–207.
  • Kusner et al. (2017) M. J. Kusner, J. Loftus, C. Russell, and R. Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4069–4079.
  • Kusner et al. (2018) M. J. Kusner, C. Russell, J. R. Loftus, and R. Silva. 2018. Causal Interventions for Fairness. arXiv preprint arXiv:1806.02380 (2018).
  • Li et al. (2017) J. Li, J. Liu, L. Liu, T. D. Le, S. Ma, and Y. Han. 2017. Discrimination detection by causal effect estimation. In Big Data (Big Data), 2017 IEEE International Conference on. IEEE, 1087–1094.
  • Liu et al. (2013) W. Liu, S. J. Kuramoto, and E. A. Stuart. 2013. An introduction to sensitivity analysis for unobserved confounding in nonexperimental prevention research. Prevention Science 14, 6 (2013), 570–580.
  • Loftus et al. (2018) J. R. Loftus, C. Russell, M. J. Kusner, and R. Silva. 2018. Causal Reasoning for Algorithmic Fairness. arXiv preprint arXiv:1805.05859 (2018).
  • Louizos et al. (2015) C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. 2015. The variational fair autoencoder. arXiv preprint arXiv:1511.00830 (2015).
  • Madras et al. (2019) D. Madras, E. Creager, T. Pitassi, and R. Zemel. 2019. Fairness through Causal Awareness: Learning Causal Latent-Variable Models for Biased Data. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 349–358.
  • Nabi and Shpitser (2018) R. Nabi and I. Shpitser. 2018. Fair inference on outcomes. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    , Vol. 2018. NIH Public Access, 1931.
  • Pearl (2009) J. Pearl. 2009. Causality. Cambridge university press.
  • Pearl (2010) J. Pearl. 2010. The foundations of causal inference. Sociological Methodology 40, 1 (2010), 75–149.
  • Pearl (2019) J. Pearl. 2019. On the Interpretation of do(x). Journal of Causal Inference, forthcoming (2019).
  • Robins et al. (2000) J. M. Robins, M. A. Hernán, and B. Brumback. 2000. Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology 11, 5 (2000), 550–560.
  • Romei and Ruggieri (2014) A. Romei and S. Ruggieri. 2014. A multidisciplinary survey on discrimination analysis.

    The Knowledge Engineering Review

    29, 5 (2014), 582–638.
  • Rosenbaum (1991) P. R. Rosenbaum. 1991. A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society. Series B (Methodological) (1991), 597–610.
  • Rosenbaum (2005) P. R. Rosenbaum. 2005. Sensitivity analysis in observational studies. Encyclopedia of Statistics in Behavioral Science 4 (2005), 1809–1814.
  • Rosenbaum and Rubin (1983) P. R. Rosenbaum and D. B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55.
  • Rosenbaum and Rubin (1985) P. R. Rosenbaum and D. B. Rubin. 1985. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician 39, 1 (1985), 33–38.
  • Rubin (1974) D. B. Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 5 (1974), 688.
  • Rubin (1978) D. B. Rubin. 1978. Bayesian inference for causal effects: The role of randomization. The Annals of statistics (1978), 34–58.
  • Rubin (1980) D. B. Rubin. 1980. Randomization analysis of experimental data: The Fisher randomization test comment. J. Amer. Statist. Assoc. 75, 371 (1980), 591–593.
  • Rubin (2001) D. B. Rubin. 2001. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology 2, 3-4 (2001), 169–188.
  • Rubin (2005) D. B. Rubin. 2005. Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc. 100, 469 (2005), 322–331.
  • Russell et al. (2017) C. Russell, M. J. Kusner, J. Loftus, and R. Silva. 2017. When worlds collide: integrating different counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems. 6417–6426.
  • Stuart (2010) E. A. Stuart. 2010. Matching methods for causal inference: A review and a look forward. Statistical Science: a review journal of the Institute of Mathematical Statistics 25, 1 (2010), 1.
  • van der Wal et al. (2011) W. M. van der Wal, R. B. Geskus, et al. 2011. Ipw: an R package for inverse probability weighting. J Stat Softw 43, 13 (2011), 1–23.
  • VanderWeele and Robinson (2014) T. J. VanderWeele and W. R. Robinson. 2014. On causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology (Cambridge, Mass.) 25, 4 (2014), 473.
  • Zafar et al. (2017a) M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. 2017a. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1171–1180.
  • Zafar et al. (2017b) M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi. 2017b. Fairness Constraints: Mechanisms for Fair Classification. In Artificial Intelligence and Statistics. 962–970.
  • Zemel et al. (2013) R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333.
  • Zhang and Bareinboim (2018) J. Zhang and E. Bareinboim. 2018. Fairness in Decision-Making–The Causal Explanation Formula. In 32nd AAAI Conference on Artificial Intelligence.
  • Zhang et al. (2016) L. Zhang, Y. Wu, and X. Wu. 2016. Situation testing-based discrimination discovery: a causal inference approach. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2718–2724.
  • Zhang et al. (2017) L. Zhang, Y. Wu, and X. Wu. 2017. A Causal Framework for Discovering and Removing Direct and Indirect Discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.
  • Zliobaite (2015) I. Zliobaite. 2015. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148 (2015).