Discrimination-aware classification is receiving an increasing attention in the data mining and machine learning fields. Many methods have been proposed for constructing discrimination-free classifiers, which can be broadly classified into three categories: the pre-process methods that modify the training data [Kamiran and Calders2009a, Feldman et al.2015, Zhang et al.2017, Calmon et al.2017, Nabi and Shpitser2017], the in-process methods that adjust the learning process of the classifier [Calders and Verwer2010, Kamishima et al.2011, Kamishima et al.2012, Zafar et al.2017], and the post-process methods that directly change the predicted labels [Kamiran et al.2010, Hardt et al.2016]. All three categories of methods have their respective limitations. For the in-process methods, they usually perform some tweak or develop some regularizers for the classifier to correct or penalize discriminatory outcomes during the learning process. However, since the discrimination or fair constraints are generally not convex functions, surrogate functions are usually used for the minimization. For example, in [Zafar et al.2017], the covariance between the protected attribute and the signed distance from the data point to the decision boundary is used as the surrogate function. As a result, additional bias might be introduced due to the approximation errors associated with the surrogate function. For the post-process methods, they are restricted to those who can modify the predicted label of each individual independently. For example, in [Hardt et al.2016], the probability of changing the predicted label to one given an individual’s protected attribute is optimized. Thus, methods that map the whole dataset or population to a new non-discriminatory one cannot be adopted for post-process, which means that a number of causal-based discrimination removal methods (e.g., [Zhang et al.2017, Nabi and Shpitser2017]) cannot be applied.
In our work, we target the pre-process methods that modify the training data, where some methods only modify the label, such as the Massaging [Kamiran and Calders2009a, Žliobaitė et al.2011] and the Causal-Based Removal [Zhang et al.2017], and some methods also modify the data attributes other than the label, such as the Preferential Sampling [Kamiran and Calders2012, Žliobaitė et al.2011], the Reweighing [Calders et al.2009], and the Disparate Impact Removal [Feldman et al.2015, Adler et al.2016]. The fundamental assumption of the pre-process methods is that, since the classifier is learned from a discrimination-free dataset, it is likely that the future prediction will also be discrimination-free [Kamiran and Calders2009b]. Although this assumption is plausible, however, there is no theoretical guarantee to show “how much likely” it is and “how discrimination-free” the prediction would be given a training data and a classifier. The lack of the theoretical guarantees places great uncertainty on the performance of all pre-process methods.
In this paper, we fill this gap by modeling the discrimination in prediction using the causal model. A causal model [Pearl2009]
is a structural equation-based mathematical object that describes the causal mechanism of a system. We assume that there exists a fixed but unknown causal model that represents the underlying data generation mechanism of the population. Based on the causal model, we define the causal measure of discrimination in population as well as in prediction. We then formalize two problems, namely discovering and removing discrimination in prediction. Based on specific assumptions regarding the causal model and the causal measure of discrimination, we conduct concrete analysis of discrimination. We derive the formula for quantitatively measuring the discriminatory effect in population from the observable probability distributions. We then derive the corresponding causal measure of the discrimination in prediction, as well as their approximations from the sample dataset. Finally, we link the discrimination in prediction with the discrimination in the training data by a probabilistic condition, which mathematically bounds the probability of the discrimination in prediction being within a given interval in terms of the training data and classifier.
As a consequence, we obtain two important theoretical results: (1) even if the discrimination in the training data is completely removed, the discrimination in prediction can still exist due to the bias in the classifier; and (2) for removing discrimination, different from the claims of many previous work, not all pre-process methods can ensure non-discrimination in prediction even though they can achieve non-discrimination in the modified training data. We show that to guarantee non-discrimination in prediction, the pre-process methods should only modify the label. Based on the results, we develop a two-phase framework for constructing a discrimination-free classifier with a theoretical guarantee, which provides a guideline for employing existing pre-process methods or designing new ones. The experiments demonstrate the theoretical results and show the effectiveness of our two-phase framework.
2 Problem Formulation
2.1 Notations and Preliminaries
We consider an attribute space which consists of some protected attributes, the label of certain decision attribute, and the non-protected attributes. Throughout the paper, we use an uppercase alphabet, e.g., to represent an attribute; a bold uppercase alphabet, e.g., , to represent a subset of attributes. We use a lowercase alphabet, e.g., , to represent a realization or instantiation of attribute ; a bold lowercase alphabet, e.g., , to represent a realization or instantiation of . For ease of representation, we assume that there is only one protected attribute, denoted by , which is a binary attribute associated with the domain values of the non-protected group and the protected group . We denote the label of the decision attribute by , which is a binary attribute associated with the domain values of the positive label and negative label . According to the convention in machine learning, we also define that and . The set of all the non-protected attributes is denoted by .
A causal model is formally defined as follows.
Definition 1 (Causal Model).
A causal model is a triple where
is a set of hidden contextual variables that are determined by factors outside the model.
is a set of observed variables that are determined by variables in .
is a set of equations mapping from to . Specifically, for each , there is an equation mapping from to , i.e.,
where is a realization of a set of observed variables called the parents of , and is a realization of a set of hidden variables .
The causal effect in the causal model is defined over the intervention that fixes the value of an observed variable(s) to a constant(s) while keeping the rest of the model unchanged. The intervention is mathematically formalized as or simply . Then, for any variables , the distribution of after is defined as [Pearl2009]
where denotes the value of after intervention under context .
is an unknown joint distribution of all hidden variables. If the causal model satisfies the Markovian assumption: (1) the associated causal graph of the causal model is acyclic; and (2) all variables inare mutually independent, can be computed from the joint distribution of according to the truncated factorization formula [Pearl2009]
where the summation is a marginalization that traverses all value combinations of , and means replacing with in each term.
2.2 Model Discrimination in Prediction
Assume that there exists a fixed population over the space . The values of all the attributes in the population are determined by a causal model , which can be written as
where can be considered as the decision making process in the real system. Without ambiguity, we can also use to denote the population, and use the terms mechanism and population interchangeably. In practice, is unknown and we can only observe a sample dataset drawn from the population.
A classifier is function mapping from to . A hypothesis space is a set of candidate classifiers. A learning algorithm analyzes dataset as the training data to find a classifier from that minimizes the difference between the predicted labels and the true labels (). Once training completes, the classifier is deployed to infer prediction on any new unlabeled data. It is usually assumed that the unlabeled data is drawn from the same population as the training data, i.e., . Therefore, in prediction, the values of all the attributes other than the label are determined by the same mechanisms as those in , and meanwhile the classifier acts as a new mechanism for determining the value of the label. We consider with function being replaced by classifier as the causal model of classifier , denoted by , which is written as
If we apply the classifier on , we can obtain a new dataset by replacing the original labels with the predicted labels, i.e., . Straightforwardly, can be considered as a sample drawn from .
The discrimination in prediction made by classifier can be naturally defined as the discrimination in . To this end, we first define a measure of discrimination in based on the causal relationship specified by , denoted by called the true discrimination. By adopting the same measure, we denote the discrimination in by , called the true discrimination in prediction. Then, we denote the approximation of from dataset by , and similarly denote the approximation of from dataset by .
Our target is to discover and remove the true discrimination in prediction, i.e., , based on certain causal measure of discrimination defined on , i.e., . When calculating from dataset , we may encounter disturbances such as the sampling error of and the misclassification of . We then need to compute analytic approximation to . Thus, we define the problem of discovering discrimination in prediction as follows.
Problem 1 (Discover Discrimination in Prediction).
Given a causal measure of discrimination defined on , i.e., , a sample dataset and a classifier trained on , compute analytic approximation to the true discrimination in prediction, i.e., .
If the true discrimination in prediction is detected according to the approximation, the next step is to remove the discrimination through tweaking the dataset and/or the classifier. Thus, we define the problem of removing discrimination in prediction as follows.
Problem 2 (Remove Discrimination in Prediction).
Given , and , tweak and/or in order to make be bounded by a user-defined threshold .
3 Discover Discrimination in Prediction
In the above general problem definitions, can be any reasonable causal measure of discrimination defined on any causal model. However, a concrete analysis of discrimination must rely on specific assumptions regarding the causal measure of discrimination and the causal model. The remaining of the paper is based on following assumptions: (1) satisfies the Markovian assumption; (2) we consider all causal effects (total effect) of on as discriminatory; (3) we assume that has no parent and
has no child. The first assumption is necessary for computing the causal effect from the observable probability distributions. The second assumption is because the total causal effect is the causal relationship that is easiest to interpret and estimate. We will extend our results to other discrimination definitions such as those in[Zhang et al.2017, Bonchi et al.2017] in the future work. The last assumption is to make our theoretical results more concise and can be easily relaxed.
3.1 Causal Measure of Discrimination
We first derive the true discrimination in . The key of discrimination is a counterfactual question: whether the decision of an individual would be different had the individual been of a different protected/non-protected group (e.g., sex, race, age, religion, etc.)? To answer this question, we can perform an intervention on each individual to change the value of protected attribute and see how label will change. We consider the difference between the expectation of the labels when performing for all individuals and the expectation of the labels when performing for all individuals, and use it as the causal measure of discrimination.
To obtain the above difference, note that the causal model is completely specified at the individual level when context is specified. For each individual specified by , denote the label of individual by (resp. ) when is fixed according to (resp. ). Then, the difference in the label of individual is given by . The expected difference of the labels over all individuals is hence given by . Based on this analysis, we obtain the following proposition.
Given a causal model , the true discrimination is given by
Interestingly, the obtained discrimination causal measure is the same as the classic statistical discrimination metric risk difference, which is widely used as the non-discrimination constraint in discrimination-aware learning [Romei and Ruggieri2014]. Our analysis can help understand the assumptions and scenarios in which the risk difference applies.
Given dataset , we approximate using the maximum likelihood estimation, denoted by as shown below.
Given a dataset , the discrimination in is given by
with being the conditional frequency in .
Given a classifier , denote the predicted labels by . By adopting the same causal measure of discrimination of , we obtain shown as follows.
Given a causal model and a classifier , the true discrimination in prediction is given by
where (resp. ) is the probability of the positive predicted labels for the data with (resp. ).
We similarly define as the maximum likelihood estimation of .
Given a dataset and a classifier trained on , the discrimination in is given by
3.2 Bounding Discrimination in prediction
To approximate from , sampling error cannot be avoided since is only a subset of the entire population. Although exact measurement of sampling error is generally not feasible as is unknown, it can be probabilistically bounded. In the following we first bound the difference between and its approximation , and then extend the result to the difference between and its approximation .
Given a causal model and a sample dataset with size of , the probability of the difference between and no larger than is bounded by
where and () are the numbers of individuals with and in .
By definition of and we have
Denoting by the label of the th individual in with , we can write as
where indicators (
) can be considered as independent random variables bounded by the interval. Note that . According to the Hoeffding’s inequality [Murphy2012], we have
Similarly, we have . Therefore, we have
where the last line is by substituting with .
For extending Proposition 5 to Proposition 6, the difference is that, since is a classifier depending on training data , indicators cannot be considered as independent. Thus, the Hoeffding’s inequality cannot be directly applied and a uniform bound for all hypotheses in is needed.
Given a causal model , a sample dataset , and a classifier from hypothesis space , the probability of the distance between and no larger than is bounded by
with being the VC dimension of .
According to the definitions of and ,
Similar to the proof of Proposition 5, we treat as the mean of indicators (
). According to the generalization bound in statistical learning theory[Vapnik1998], if is finite we have
where is the size of . If is infinite we have
where is the VC dimension of . Then the proposition can be proven similarly to Eq. (6). ∎
Proposition 6 provides an approximation of from . However, since pre-process methods deal with the training data, it is imperative to further link with . Next, we give the relation between and in terms of a bias metric that we refer to as the the error bias.
Definition 2 (Error Bias).
For any classifier trained on a training dataset , the error bias is given by
where are the percentages of false positives on data with and respectively, and are the percentages false negatives on data with and respectively.
For any classifier that is trained on , we have
By definition, is given by
which can be rewritten as
Similarly, is given by
Subtracting from , we obtain
which is equivalent to
Similarly for data with , we have
Letting completes the proof. ∎
Given a causal model , a sample dataset and a classifier trained on , is bounded by
where the meaning of is same as that in Proposition 6.
Theorem 1 gives a criterion of non-discrimination in prediction that incorporates both the discrimination in the training data and the error bias of the classifier, i.e., being bounded by a threshold . It shows that either given a discrimination-free dataset , i.e., , or a “balanced” classifier, i.e., , we cannot guarantee non-discriminatory prediction. Instead, it requires to ensure that the sum of and is within the threshold.
|Size||Two-phase framework (MSG)||DI|
|(w/o classifier tweaking)|
Measured discrimination after discrimination removal (decision tree as the classifier).
4 Remove Discrimination in Prediction
This section solves the problem of removing discrimination in prediction: if criterion is not satisfied for a classifier, how can we meet the criterion through modifying the training data and tweaking the classifier? Denote by a dataset obtained by modifying , and by a new classifier trained on . Note that when the training data is modified, the error bias of the classifier built on it will also change. Thus, it is difficult to perform the training data modification and the classifier tweaking simultaneous. We propose a framework for modifying the training data and the classifier in two successive phases, as summarized in Algorithm 1.
In the first phase, we modify to remove the discrimination it contains. The modified dataset can be considered as being generated by a causal model that is different from with respect to the modification. Note that if is achieved, Theorem 1 ensures the bound of discrimination for , i.e., the discrimination of performed on , but not for , i.e., the discrimination of performed on the original population. If we only modify the label of , can be written as
Then, the causal model of any classifier trained on and performed on is given by
which is equivalent to . Thus, non-discrimination in also means non-discrimination in . On the other hand, if we modify attributes other than , since the new unlabeled data is drawn from the original population, is inconsistent with . As a result, the non-discrimination result of the training data cannot be applied to the prediction of the new data. Therefore, we have the following corollary derived from Theorem 1.
Let be a modified dataset from , and be a new classifier trained on . If only modifies the labels, then is a sufficient condition to achieve
where the meaning of is same as that in Proposition 6.
In the second phase, we make modifications to to reduce the error bias. Although dealing with a different fairness criterion, existing methods for balancing the misclassification rates (e.g., [Hardt et al.2016]) can be easily extended for solving this problem. For the purpose of evaluating the correctness of our theoretical results, here we use a simple algorithm RandomFlip for reducing the error bias that can be applied to any classifier. After the classifier makes a prediction, RandomFlip randomly flips the predicted label with certain probability (resp. ) if the individual has (resp. ) to achieve , where can be computed according to the prediction of over . Denoting , it suffices to make and . Assume that , then it can be easily shown that should satisfy . Similar result can be obtained if .
5 Empirical Evaluation
5.1 Experimental Setup
In this section, we conduct experiments to evaluate our theoretical results. For simulating a population, we adopt a semi-synthetic data generation method. We first learn a causal model for a real dataset, the Adult dataset [Lichman2013], and treat it as the ground-truth. We then generate the training data based on . The causal model is built using the open-source software Tetrad [Glymour and others2004].
The Adult dataset consists of 11 attributes including age, education, sex, occupation, income, marital_status
etc. Due to the sparse data issue, we binarize each attribute’s domain values into two classes to reduce the domain sizes. We treatsex as and income as . The discrimination is measured as in , i.e., . Based on the underlying distribution of , we generate a number of training data sets with different sample sizes.
When constructing discrimination-free classifiers using the two-phase framework, we select one representative data modifying algorithm that only modifies , the Massaging (MSG) algorithm [Kamiran and Calders2009a]. For other algorithms, we will evaluate their performance in preserving data utility in the future work. For comparison, we also include an algorithm that modifies , the Disparate Impact Removal (DI) algorithm [Adler et al.2016]. The proposed RandomFlip algorithm is used for tweaking the classifier. We assume a discrimination threshold , i.e., we want to ensure that the discrimination in prediction is not larger than 0.05.
5.2 Experiment Results
We first measure the discrimination in each training data set, i.e., . Then, we learn the classifier
from the training data where two classifiers, decision tree (DT) and support vector machine (SVM) are used. By replacing the labels in the training data with the labels predicted by the classifier, we obtainwhose discrimination is measured as . Finally, we measure the discrimination in prediction, i.e., according to Proposition 3
. For each sample size, we repeat the experiments 100 times by randomly generating 100 different sets of training data. We report the average and variance of each measured quantity of discrimination.
The results are shown in Table 1. As expected, with the increase of the sample size, the difference between and decreases as shown by the variance. Since , the training data contains discrimination. As a result, both the training data with predicted labels, i.e., , and the prediction, i.e., , also contain discrimination.
To show the effectiveness of the two-phase framework, we first apply MSG to completely remove the discrimination in the above training data, obtaining the modified training data . Then, a decision tree is built on , and the RandomFlip algorithm is executed to tweak the classifier so that the error bias is less than 0.05, i.e., . Finally, we measure the discrimination in . For comparison, the same process is also performed for DI.
The results are shown in Table 2. By using the two-phase framework, discrimination is removed from the training data as shown by , and more importantly, removed from the prediction as shown by . We also see that, if the classifier tweaking is not performed, the prediction still contains discrimination. However, for DI, even when the discrimination is removed from the training data, and the error bias in the classifier is also removed, there still exists discrimination in prediction. These results are consistent with our theoretical conclusions.
In this paper, we addressed the limitation of the pre-process methods that there is no guarantee about the discrimination in prediction. Our theoretical results show that: (1) only removing discrimination from the training data cannot ensure non-discrimination in prediction for any classifier; and (2) when removing discrimination from the training data, one should only modify the labels in order to obtain a non-discrimination guarantee. Based on the results, we developed a two-phase framework for constructing a discrimination-free classifier with a theoretical guarantee. The experiments adopting a semi-synthetic data generation method demonstrate the theoretical results and show the effectiveness of our two-phase framework.
- [Adler et al.2016] Philip Adler, Casey Falk, Sorelle A Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. Auditing black-box models for indirect influence. In Proceedings of ICDM 2016, 2016.
- [Bonchi et al.2017] Francesco Bonchi, Sara Hajian, Bud Mishra, and Daniele Ramazzotti. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics, 3(1):1–21, 2017.
[Calders and Verwer2010]
Toon Calders and Sicco Verwer.
Three naive bayes approaches for discrimination-free classification.Data Mining and Knowledge Discovery, 21(2):277–292, 2010.
- [Calders et al.2009] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Data mining workshops, 2009. ICDMW’09. IEEE international conference on, pages 13–18. IEEE, 2009.
- [Calmon et al.2017] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, pages 3995–4004, 2017.
- [Feldman et al.2015] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In KDD, pages 259–268. ACM, 2015.
- [Glymour and others2004] Clark Glymour et al. The TETRAD project. http://www.phil.cmu.edu/tetrad, 2004.
[Hardt et al.2016]
Moritz Hardt, Eric Price, Nati Srebro, et al.
Equality of opportunity in supervised learning.In Advances in Neural Information Processing Systems, pages 3315–3323, 2016.
- [Kamiran and Calders2009a] Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1–6. IEEE, 2009.
Faisal Kamiran and Toon Calders.
21st Benelux Conference on Artificial Intelligence (BNAIC), pages 333–334, 2009.
- [Kamiran and Calders2012] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. KAIS, 33(1):1–33, 2012.
- [Kamiran et al.2010] Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. Discrimination aware decision tree learning. In ICDM, pages 869–874. IEEE, 2010.
- [Kamishima et al.2011] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In ICDMW, pages 643–650. IEEE, 2011.
- [Kamishima et al.2012] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35–50. Springer, 2012.
- [Lichman2013] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.
- [Murphy2012] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
- [Nabi and Shpitser2017] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. arXiv preprint arXiv:1705.10378, 2017.
- [Pearl2009] Judea Pearl. Causality: models, reasoning and inference. Cambridge university press, 2009.
Andrea Romei and Salvatore Ruggieri.
A multidisciplinary survey on discrimination analysis.
The Knowledge Engineering Review, 29(05):582–638, 2014.
- [Vapnik1998] Vlamimir Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998.
- [Žliobaitė et al.2011] Indre Žliobaitė, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In ICDM, pages 992–1001. IEEE, 2011.
- [Zafar et al.2017] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962–970, 2017.
- [Zhang et al.2017] Lu Zhang, Yongkai Wu, and Xintao Wu. A causal framework for discovering and removing direct and indirect discrimination. In IJCAI, 2017.