Automated decision making has become more and more successful over the last few decades and has therefore been used in an increasing number of domains, either as stand alone, or to support human decision makers. This includes many sensitive domains which significantly impact people’s livelihoods, such as granting loans, university admissions, recidivism predictions, or insurance rate settings. It has been found that many such decision tools, often unintentionally, have biases against minority groups, and therefore lead to discrimination. In response to these concerns, the machine learning research community has been devoting effort to developing clear notions of fair decision making, and coming up with algorithms for implementing fair machine learning.
A common approach to address the important issue of fair algorithmic decision making is through fair data representation. The idea is that some regulator, or a responsible data curator, transforms collected data to a format (– representation), that can then be used for solving downstream classification tasks, while providing guarantees of fairness. This approach was put forward by the seminal paper of Zemel et al. 
. In their words: "our intermediate representation can be used for other classification tasks (i.e., transfer learning is possible)"… "We further posit that such an intermediate representation is fundamental to progress in fairness in classification, since it is composable and not ad hoc; once such a representation is established, it can be used in a blackbox fashion to turn any classification algorithm into a fair classifier, by simply applying the classifier to the sanitized representation of the data". Many followup papers aim to realize this paradigm, solving technical and algorithmic issues[9, 5, 10, 12, 2] (to mention just a few). The main contribution of this paper is showing that, basically, it is impossible to achieve this goal
. Namely, for Demographic Parity (DP) fairness, given any domain partitioned into two non-empty groups, no data representation can guarantee that every classifier expressible under that representation is DP fair for all possible probability distributions over that domain. For fairness notions that take ground truth classification into account, like Odd Equality (EO), given any two different non-redundant tasks555Namely, tasks in which the true label has some correlation with the group membership., no data representation can simultaneously allow accurate label classifiers for both while guaranteeing that any classifier expressible over that representation is EO fair for both these tasks. This impossibility applies even if one restricts the tasks in question to share the same marginal (unlabeled) data distribution.
Our results answer negatively the main two open questions posed in the discussion section of Creager et al. .
There is an apparent discrepancy between our impossibility results and the long list of papers claiming to achieve fair representations. What is the source of that discrepancy? Note that there is a difference in the setup of the problem. The key distinguishing component is that in most (if not all) of the papers that claim positive results about fair representations, the designer of the fair representation has access to the data distribution w.r.t. which the fairness is being evaluated. When the notion of fairness is independent of the ground truth classification (the case of Demographic Parity), the distribution in question is the marginal (unlabeled) one. When the notion of fairness of concern does involve true labels (such as Odds Equality or Group Calibration), the algorithms that define the representations require, on top of that, access to the ground truth labels of sample instances. What we show here is that this access to the data distribution at evaluation (or test) time, is necessary for the ability to guarantee the fairness of representations. That common (often implicit) assumption can be justified only in very limited situations. For example, the definition of Demographic Parity for acceptance of students to a given university program depends on the distribution of applicants to that program at the given term. This may change between universities, between programs and between academic years. Therefore, based on our results, any a priori designed data representation cannot be guaranteed to provide Demographic Parity fairness it aims to establish for acceptance of students to academic programs. The situation is similar when it comes to granting loans - the distribution of applicants changes between loan granting institutions, branch locations, requested sums, dates, etc. In fact, it is hard to come up with any realistic scenarios in which a fixed data distribution remains unchanged throughout various classification tasks that may use the data down the road. Therefore no data representation can meet the goal stated in , namely - "be used in a blackbox fashion to turn any classification algorithm into a fair classifier, by simply applying the classifier to the sanitized representation of the data".
1.1 What is fair representation?
The term ‘fair data representation’ encompasses a wide range of different meanings. When word embedding results in smaller distance between the vectors representing ‘woman’ and ‘nurse’ relative to the distance between the representations of ‘woman’ and ‘doctor’ and the other way around for ‘man’, is it an indication of bias in the representation or is it just a faithful reflection of a bias in society? Rather than delving into such issues, we discuss an arguably more concrete facet of data representation; We examine representation fairness from the perspective of its effect on the fairness of classification rules that agents using data represented that way may come up with. Such a view takes into consideration two setup characteristics:
- The objective of the agent using the data
We distinguish three types of classification prediction agents (formal definitions of these aspects of fairness are provided in section 3.2):
- driven by a bias against a group of subjects. To protect against such an agent, a fair representation (or feature set) should be such that every classifier based on data represented that way is fair. This is apparently the most common approach to fair representations in the literature e.g., [14, 9].
- Accuracy Driven
- focusing on traditional measures of learning efficiency, ignoring fairness considerations. A representation is accuracy-driven fair if every loss minimizing classifier based on that representation is fair.
- Fairness Driven
- aiming to find a decision rule that is fair while maintaining meaningful accuracy. A representation is fairness-driven fair if there exists a loss minimizing (or an approximate minimizer) classifier based on that representation that is fair.
- The notion of group fairness applied to the classification decisions
The wide range of group fairness notions (for classification) can be taxonomized along several dimensions: Does the notion depend on the ground truth classification or only on the agent’s decision (like demographic parity)? Is a perfectly accurate decision (matching the ground truth classification) always considered fair (like in odds equality)? Does the fairness notion depend on unobservable features (like intention or causality)? In this work we focus on fairness notions that are ground-truth-dependent, view the ground truth classification as fair and depend only on observable features.
Picking which notion of fairness one wishes to abide by depends on societal goals and may vary from one task to another. This is outside the scope of this paper. Just the same, let us briefly explain why the requirements listed above are natural in many situations.
- The dependence on the ground truth classification
is almost inevitable from a utilitarian perspective - taking into account the probability that a student succeeds or fails when making acceptance decisions should not be considered unfair. Put more formally, whenever there is any correlation between membership and the ground truth classification, any classifier that is fair w.r.t. a notion that ignored the ground truth (like demographic parity) is bound to suffer prediction error proportional to that correlation.
- Viewing perfectly accurate decisions as fair
can be viewed as a distinction between notions that do or do not try to inflict affirmative action. It makes a lot of sense in tasks like conviction in a crime - if you convict all criminals and no one else, you should not be accused of unfairness.
- Relying only on observable features
fosters objectivity and allows scrutiny of the decisions made. Our running example of such a notion is odds equality , however our results hold as well for other common notions of fairness that meet the above conditions (like Calibration within groups ). We provide formal definitions of these notions in Section 3.1.
1.2 Our results
We prove the following inherent limitations of notions of fair representations:
The impossibility to be task-independent. There is a host of literature proposing methods for coming up with data representations that guarantees the fairness of classifiers based on those representations (e.g., [2, 9, 11]). We elaborate on some of these works in our Previous Work section. Contrasting the impression conveyed by many such papers, we show that the ability to guarantee multi-task fairness is inherently limited. Much of that work addresses Demographic parity (DP). We prove that if two tasks have different marginal data distributions (that is, the distribution of unlabeled instances), then no representation can guarantee that any non-trivial classifier trained on it satisfies DP for both. We show that the only classifiers that are guaranteed to satisfy any significant level of DP fairness w.r.t. all marginal distributions are the redundant constant functions. From a practical point of view, since DP fairness of some decision (say, acceptance to some university program) requires the ratio of positive decisions between groups to match the ratio of applicants from those groups, a representation that guarantees DP fairness cannot be a priori constructed - it must have access to the distribution of groups among applicants for that specific program. Furthermore, we prove that for every fixed marginal data distribution, if two ground truth classifications differ with non-zero probability over it, there can be no data representation that enjoys Odds Equality fairness and accuracy with respect to both tasks over that shared marginal distribution (except for the redundant case where the success rates of both groups are equal for both tasks). These results answer negatively the main two open problems posed in the Discussion section of .
The impossibility to evaluate the fairness contribution of a given feature devoid of the other features used (again, for each agent objective and several common group fairness notions). We show that for a fixed task, for each notion of fairness of representation, there are features that when added to one set of features render the resulting representation more fair, and when added to a different set of features render the resulting representation less fair.
The inherent dependence of the effect on fairness of adding/deleting a feature on the type of agent using the representation (on top of the above mentioned dependence on other features), even when the feature in question does not correlate with membership in the protected group.
(These come on top of the obvious dependence on the notion of fair classification sought).
Paper road map: Section 2 gives an overview of the related work. Section 3 introduces our setup including our taxonomy for fair representations. Section 4 contains our main results on the impossibility of generic fairness of a representation. Section 5 addresses the impossibility of defining the fairness effect of a single feature without considering the other components of a representation. Section 6 briefly shows the impossibility of having fair representations w.r.t. Predictive Rate Parity. Section 7 is our concluding remarks.
2 Related Work
Most, if not all, of the literature concerning the creation of fair data representations addresses this task in a setup where some input data (or a probability distribution over some domain of individuals) is given to the agent building the representation (e.g., [5, 9, 14, 12]). Such a probability distribution is essential to any common definition of fairness. However, in many cases the probability distribution with respect to which the fairness is defined remains implicit. For example,  define their notion of fairness by saying: "We formulate this using the notion of statistical parity, which requires that the probability that a random element from maps to a particular prototype is equal to the probability that a random element from maps to the same prototype" (where and
are the two groups w.r.t. which one aims to respect fairness). However, they do not specify what is the meaning of "a random element". The natural interpretation of these terms is that "random" refers to the uniform distribution over the finite set of individuals over which the algorithm selects. In that case, that information varies with each concrete tasks and is not available to the task-independent representation designer. Alternatively, one could interpret those "random" selections as picking uniformly at random from some established large training set that is fixed for all tasks. Such randomness may well be available to the representation designer, but it misses the intention of statistical parity fairness; For example, the fixed training set may have 10,000 individuals from one group and 20,000 from the other group, but when some local bank branch allocates loans it has 80 applicants from the first group and 37 applicants from the other. For the fairness of these loan allocation decisions, the relevant ratio between the groups is 80/37 rather than the 10,000/20,000 ratio available to the representation designer.
Almost all the work on fair representations focuses on the demographic parity (DP) notion of fairness [5, 9, 14, 12]. To achieve DP fairness, a classifier has to induce success ratio between the groups of subjects that match the ratio between these groups in the input data. However, as demonstrated above, that ratio varies from one application to another and cannot be determined a priori. We show that any fixed representation that allow expressing non-trivial classification cannot guarantee DP fairness in the face of shifting marginal (that is, unlabeled) data distribution (see section 4).
When the data marginal distribution w.r.t. which the fairness is defined is fixed and available to the designer of a representation, then, as shown by  and followup papers, DP fairness is indeed possible. However, we further show that even under these assumptions, no data representation can guarantee fairness with respect to notions of fairness that do rely on the correct ground truth, such as equalized odds (EO) , for arbitrary tasks (see Section 4).
To the best of our knowledge this fact also has not been explicitly stated (and proved) before, although it seems that some of the previous work were aware of this concern; in previous work discussing fair representation w.r.t. notions of fairness that take the ground truth classification into account, the algorithms that design the representations require access to task specific labeled data (e.g. [15, 1, 12, 4]). Such a requirement defies the goal of having a fixed representation that guarantees fairness for many tasks.
The effect of the motivation of the user of the representation on the fairness of the resulting decision rule has been considered by Madras et al.  and Zhang et al. . These papers identify two motivations. The first is malicious, which is the intent to discriminate without regard for accuracy. The second is accuracy-driven, which is the intent to maximize accuracy. We address these effects as part of our taxonomy of notions of fair representations. Additionally, we discuss fairness-driven agents that aim to achieve fairness while maintaining some level of accuracy.
The question of feature deletion has also been considered in real world examples, such as in the "ban the box" policy which disallowed employers using criminal history in hiring decisions . The effect of allowing or disallowing features on fairness has been studied before, for example in Grgic-Hlaca et al. . However in previous works, the effect of a feature on fairness, has been discussed in isolation. In contrast, we show that fairness of a feature should not be considered in isolation, but should also take into account the remaining features available.
3 Formal Setup
We consider a binary classification problem with label set over a domain of instances we wish to classify, e.g. individuals applying for a loan. We assume the task to be given by some distribution over from which instances are sampled i.i.d. We denote the ground-truth labeling rule as . We will think of the label 1 as denoting ‘qualified’ and the label 0 as ‘unqualified’ and . For concreteness, we focus here on the case of deterministic labeling (that is ). Most of our discussion can readily be extended to the probabilistic labeling case. In a slight abuse of notation we will sometimes use to indicate the label coordinate of an instance
A data representation is determined by a mapping , for some set , and the learner only sees for any instance (both in the training and the test/decision stages). We denote the hypothesis class of all feature based decision rules as
. As a loss function we consider a weighted sum of false positives and false negatives, i.e.
for some weight . We denote the true risk with such weighted loss as and the empirical risk, with respect to a training sample , as . In this version of the paper we focus on the case of equal weights to both types of errors and use and to denote and .
3.1 Notions of group fairness
For our fairness analysis we assume the population to be partitioned into two sub-populations and (namely, we restrict our discussion the case of one binary protected attribute). We sometimes use a function notation to indicate the group-membership of an instance. Of course in reality there are often many protected attributes with more than two values. However, as our goal is to show limitations and impossibility results for fair representation learning, it suffices to only consider one binary protected attribute – the same impossibilities readily follow for the more complex settings.
We now define two widely used notions of group-fairness that we will refer to throughout the paper, namely, equalized odds and demographic parity. In the following we will denote with the subset of with label and group membership , i.e.
The notion of group-fairness we will focus on in this paper is the ground-truth-dependent notion of odds equality as introduced by .
Definition 1 (Group fairness; Equalized odds)
A classifier is considered fair w.r.t. to odds equality () and a distribution if for we have the statistical independence . For let the false positive rate and the false negative rate be defined as and respectively. The EO unfairness is given then by the sum of differences in false positive rate and false negative rate between groups:
If we say a classifier is fair, without referring to any particular group-fairness notion, we mean fairness w.r.t. equalized odds.
Definition 2 (Demographic parity)
A classifier is considered fair w.r.t. to demographic parity () and a distribution if for , we have . The respective unfairness is given by difference in positive classification rates between groups
3.2 The role of the agent’s objective
We will phrase our definitions of representation fairness in terms of a general group fairness notion with unfairness measure .
We start by considering a malicious decision maker who tries to actively discriminate against one group. To protect against this kind of decision maker, we need to give a guarantee such that based on the feature set it is not possible to discriminate against one group. This corresponds to the notion of adversarial fairness.
Definition 3 (Adversarial fairness)
A representation is considered to be adversarial fair w.r.t. the distribution and group fairness objective , if every classifier is group-fair. We define the adversarial unfairness of a representation by .
Furthermore, we consider an accuracy-driven decision maker, who aims to label instances correctly and is agnostic about fairness. For this kind of decision maker, we only need to make sure that optimizing for correct classification results in a fair classifier. The following definition ensures that the Bayes optimal classifier for a representation is fair.
Definition 4 (Accuracy-driven fairness)
A representation is considered to be accuracy-driven fair w.r.t. the fairness objective and distribution , and a threshold , if every classifier with is group-fair with respect to this objective. The accuracy-driven unfairness for a particular threshold parameter is given by .
We note that in cases where the decision maker does not have access to the distribution , but only to a labelled sample, this requirement might not be sufficient for guaranteeing that an accuracy-driven decision maker arrives at a fair decision.
Lastly, we also consider a fairness-driven decision maker who actively tries to find a fair and accurate decision rule, while maintaining some accuracy guarantees. For such a decision maker a representation should allow for fair and accurate decision rules. If a representation fulfills this requirement, we call it fairness-enabling.
Definition 5 (-fairness-enabling representation)
A representation is considered to be -fairness-enabling w.r.t. a fairness objective , if there exists a classifier that such that and .
Our discussion focuses primarily on the case of malicious and accuracy-driven decision makers. These notions of fair representation can be defined with respect to any group-fairness notion. In our paper we will mainly focus on the equalized odds notion of fairness .
4 Can there be a generic fair representation?
We address the existence of a multi-task fair representation. We prove that for the adversarial agent scenario (which is the setup that most fairness representation previous work is concerned with), it is impossible to have generic non-trivial fair representations - no useful representation can guarantee fairness for all "downstream" classification that are based on that representation (even if the ground truth classification remains unchanged and only the marginal may change between tasks).
We start by considering scenarios in which only the marginals shift between two tasks, e.g. two openings for different jobs, requiring similar skills, for which different pools of people would apply. Such a distribution shift can likely affect one group more than another and would thus affect the classification rates of both groups differently. We show that we cannot guarantee fairness of a fixed data presentation for general shifts of this kind, even for the simplest case of demographic parity.
Pick any domain set and any partition of into non-empty subsets . For every non-constant function there exists a probability distribution over such that is arbitrarily DP-unfair w.r.t. (say, ).
In particular, for an agent that makes some non-trivial binary valued decision over a set of individuals divided to Advantaged () and disadvantaged (), there will always be a probability distribution over the set of individuals (or, a subset of that set with the uniform distribution over it) relative to which that decision will be acutely Demographic Parity unfair. In other words, any representation that allows a non-constant classifier can not provide a DP fairness guarantee for all possible tasks over the same set of individuals.
Proof: If is constant on any of the groups or then, since is not a constant over there are points in the other group on which has the opposite value. Thus, from not being constant, we can conclude that there are two labels , such that the sets and are both non-empty. Now we choose the marginal to assign probability to and probability 0.5 to . Clearly fails DP w.r.t. this .
No data representation can guarantee the DP fairness of any non-trivial classifier w.r.t. all possible data generating distributions (over any fixed domain set with any fixed partition into non-empty groups). That is, any non-constant representation F, cannot be adversarially fair with respect to and any arbitrary task .
Proof of Corollary 1: For any non-constant function , we have seen that there exists a marginal such that does not fulfill demographic parity with respect to (Claim 1). Now if a representation is non-constant, it allows some non-constant function using that representation. Thus no non-constant representation can fulfill adversarial demographic parity with respect to any distribution .
We can now show a similar effect for EO-fairness, i.e. we show that there is no representation that can guarantee EO fairness for arbitrary marginal shifts. This result is directly implied by the following claim.
For every function non-constant function and every non-constant classifier with and (where , denotes the function that maps every element to ), there exists a marginal , such that has high unfairness with respect to and , (i.e. ).
Proof of Claim 2: Let be any non-constant function and be any non-constant classifier with . Then we know that at least three of the four sets , , and are non-empty. Thus two of these three sets, agree on the ground truth. Call them and (and let the remaining set be ). W.l.o.g. , .
Case 1: and . Then we can choose the marginal as and . Yielding,
Case 2: and : Analogous to Case 1
Case 3: there is , such that . W.l.o.g. . Then and and . In this case we can choose the marginal as and . Then all elements of will be misclassified and all elements of will either be classified correctly or be misclassified in the opposite direction, yielding to high EO unfairness. (In the case where the ground truth labeling is constant on one group, we define the misclassification rate with respect to the label it will not achieve to be zero. Then we get .)
No data representation can guarantee EO fairness of any non-constant predictor based on that representation for all "downstream" classification learning tasks. That is, any representation F that is not constant on any group, cannot be adversarially fair with respect to and any arbitrary task . This holds even if one restricts the claim to tasks sharing a fixed marginal data distribution.
Proof of Corollary 2: For any ground truth and any representation , that allows as described in Claim 2, there exists a marginal such that is highly EO unfair with respect to . Note that as long as is not constant on either group, we can find , such that the requirements from Claim 2 are fulfilled. Thus the representation is not adversarially fair with respect to and . Thus any sufficiently complex representation cannot guarantee fairness for every possible covariate shift.
The results above showed that there is no representation that can guarantee fairness for an arbitrary task. But what happens if we limit our discussion to a predefined selection of tasks? We will show that even in this restricted case, there can be no representation that guarantees EO fairness with respect to a general predefined selection of tasks. We say a distribution has equal success rates, if both groups have the same conditional probability of label , i.e. . We will now state the main result of this section.
Let and be the distributions defining two different tasks with the same marginal such that at least one of the tasks does not have equal success rates. There can be no data representation such that for , the following criteria simultaneously hold:
is adversarially fair w.r.t. and
is adversarially fair w.r.t. and
allows for perfect accuracy w.r.t. to and , i.e., there are both expressible over the representation , such that .
In order to prove this theorem we use the following lemma.
Pick any set and a partition of into two non-empty (disjoint) sets and . Let be any probability distribution over such that both and . Let such that . If is a EO fair classification w.r.t. (as the labeling rule) and is a EO fair classification w.r.t. (as the labeling rule), then and .
Proof of Lemma 1:
Consider the following four sets:
, , , .
Let , , , , denote the intersections of these sets with the set , (e.g., ), and similarly, , , , , denote the intersections of these sets with the set . Notice that
It follows that once one shows that each of these quantities can be expressed in terms of
the false positive and false negative rates when each of or is considered the true classification and the other as the predicted labeling, then the conclusion of the lemma is implied by its EO assumptions.
Using the above notation, when is the true classification,
and (and similarly for ).
And when the true classification is ,
and (and similarly for ).
We will start with the case where all eight sets and are non-empty. We note, that in this case equalized false positive rates and false negative rates of with respect to gives us the following two equations,
This implies that there are two constants with and and and .
Furthermore, being EO fair with respect to gives us
This implies that there is a constant such that and .
The cases in which one or several of these sets are empty can be shown in an analogous way. This proves our claim.
Now we can prove our theorem.
5 Fairness of a feature set vs. fairness of a feature
In this section we discuss feature deletion and its impact on the fairness of a representation. For this we assume our representation to consist of finitely many features i.e. for every and . We limit our discussion to cases where all are finite. While this assumption facilitates our analysis, we do not expect our results to be different in the cases of continuous features. We will denote the set of features as . Unless otherwise stated, we focus on the equalized Odds (EO) notion of group fairness. We denote by and the adversarial and accuracy-driven EO fairness of the representation induced by the feature set respectively. We show that it is in general not possible to determine the effect a single feature has on the fairness of a representation without considering the full representation. This is the case even if our considered feature is not correlated with the protected attribute.
5.1 Opposing effects of a feature for accuracy-driven fairness of a representation
We start our discussion with accuracy-driven fairness w.r.t. equalized odds. In this case we show that the deletion of a feature can lead to an increase in accuracy-driven unfairness for some set of other given features and that the deletion of the same feature can lead to a decrease in accuracy-driven unfairness for another set of other available features . This implies that the fairness of the feature cannot be evaluated without context. We show that this phenomena holds for a general class of features that satisfy some non-triviality properties (That on the one hand do not reveal too much information about group membership and labels (non-committing), and on the other hand does not reveal identity when label and group information is given (-anonymity )). We will start by stating the non-triviality requirements for our theorem.
We define the following two non-triviality requirements for a feature:
Non-committing We will call a feature non-committing if it leaves some ambiguity about label and group membership. That is, a feature is non-committing if there are two distinct values and , such that assigns each of these values to at least one instance of each . i.e. and for every
-anonymity A feature is -anonymous if knowing this feature, group-membership and label, will only reveal identity of an individuals up to a set of at least individuals. Namely, for every combination of value of this feature, group membership and class label, there are either no instances satisfying this combination or there are at least many such instances.
(Context-relevance for fairness of features) For every -anonymous non-committing feature , there exists a probability function over and feature sets and such that:
The accuracy-driven fairness w.r.t , and of is greater than that of , i.e.
Thus, deleting in this context will increase unfairness.
The accuracy-driven fairness w.r.t , and of is less than that of , i.e.
Thus, deleting in this context will decrease unfairness.
We note that this phenomenon can occur for quite general pairs and that we mainly need to exclude pathological cases for our construction to work.
In particular we want to note that this phenomenon can occur even if is uncorrelated with the group membership and the label for ground-truth distribution . We will give an example illustrating our last point and will refer the reader for the proof and a general discussion of the requirements on to the appendix.
Before giving our example, we need to introduce some concepts.
- feature-induced cells
A set of features induces an equivalence relation , by iff for all . We call the equivalence classes with respect to cells and denote the set of cells for a featureset as .
- ground-truth score function
We define the ground truth score function . is the probability, w.r.t. , of having the true-label , i.e., . In cases where the distribution is unambiguous we will use the abbreviated notation instead of .
- Bayes-optimal predictor
The predictor in that minimizes is the Bayes Optimal predictor that for a cell assigns the label 1 if and 0 otherwise.
We will now give an example in which both and are adversarially fair w.r.t. and in which the phenomenon from Theorem 2 holds:
Let the domain with , and . Furthermore consider the uniform distribution over , i.e. for every . For the construction of the feature set, we only consider binary features . Now let be defined by . Furthermore, let and with , , , and . The resulting cells for and are and . It is easy to see that and are adversarially fair w.r.t. and . Furthermore, we have:
Thus we see that there are indeed features which are adversarially fair w.r.t. and equalized odds, for which there is this opposing effect of feature deletion.
5.2 The fairness of a feature dependence on agent’s objective
We will now briefly discuss the effect of a single feature on fairness for the cases of an adversarial agent or a fairness-driven agent. In contrast to the accuracy-driven case, adding features has a monotone effect on the fairness of a fairness-driven and the malicious decision maker. We show in Theorem 3 that adding any feature in the adversarial case, will only give the decision maker more information and thus give the decision maker more chances of discrimination. Similarly in the fairness driven case, any feature will only give the decision maker another option for fair decision making (Theorem 4). However, the quantitative effect of adding a feature on the unfairness can still range from having no effect to achieving perfect fairness/unfairness for both the fairness-driven and the malicious case. As in the accuracy-driven case, we will show (Theorem 4 and Theorem 3) that it is impossible to evaluate the quantitative effect of a feature on the fairness of a representation without considering the context of other available features.
For any feature and any featureset we have .
For every distribution and feature , there exists a feature set , such that adding will not impact the fairness of the distribution, e.g. .
There exist distributions , features and , such that and , but .
For any feature , if a representation is -fairness-enabling, the representation is also -fairness-enabling.
For every distribution and every feature , there exists a feature set , such that is -fairness-enabling, if and only if is -fairness-enabling. Furthermore, there exists a distribution , a feature and a feature set , such that both and are not ()-fairness-enabling for any , but such that is ()-fairness-enabling.
6 Impossibility of adversarially fair representations with respect to predictive rate parity
We now show that not all acceptable notions of group fairness always allow a adversarially fair representation, even in a single-task setting. One such notion is predictive rate parity.
(Predictive rate parity (PRP)) A classifier is considered PRP fair w.r.t. to a distribution and if for the ground truth is statistically independent of the group membership , given the classification . We denote this fairness objective with .
Adversarial fairness w.r.t. and is only possible, if has equal success rates for both groups.
Proof of Theorem 5: We note that in order to achieve adversarial fairness with respect to any representation, the all-one classifier needs to be fair, as any representation admits any constant classifier. We furthermore note that the all-one classifier is fair with respect to predictive rate parity if and only if the ground truth has equal success rates. This shows our claim.
While many papers in this domain propose algorithmic solutions to fairness related issues, the main contributions of this paper are conceptual. We believe that, to a much larger extent than many other facets of machine learning, fundamental concepts of fairness in machine learning require better understanding. Some basic questions are still far from being satisfactorily elucidated; What should be considered fair decision making? (various mutually incompatible notions have been proposed, but how to pick between them for a given real life application is far from being clarified). What is a fair data representation? To what extent should accuracy or other practical utilities be compromised for achieving fairness goals? and more.
The answers to these questions are not generic. They vary with the principles and the goals guiding the agents involved (decision makers, subjects of such a decision, policy regulators, etc.), as well as with what can be assumed regarding the underlying learning setup. We view these as the primary issues facing the field, deserving explicit research attention (in addition to the more commonly discussed algorithmic and optimization aspects).
Our main result addressed the existence of generic fair representations. We show that even label-independent fairness notions like demographic parity are vulnerable to shifts in marginals between tasks. For fairness notions that do rely on the true classification, we show that fairness and accuracy cannot be simultaneously achieved by the same data representation for any two different tasks even if they are defined over the same marginal (unlabeled) data distributions. We conclude the impossibility of having generic data representations that guarantee (even just) DP fairness with respect to tasks whose marginal distributions are not accessible when designing the representation.
These insights stand in contrast to the impression arising from many recent papers
[9, 5, 10, 12, 2, 9]
that claim to learn transferable fairness-ensuring representations.
We also considered the question of "fairness of a feature", which has been used in legal scenarios. We showed that the fairness of a single feature is an ill defined notion. Namely, the impact of a feature on the fairness of a decision cannot be determined without considering the other features of the representation666While we focused on the equalized odds notion of fairness, similar results can be shown for demographic parity (i.e. a feature that has demographic parity by itself can still make a representation demographic-parity unfair (in the adversarial sense) and for other common notions of group fairness. This is simply due to the fact that pairwise statistical independence for a set of random variables does not imply statistical independence of the set of random variables.
statistical independence for a set of random variables does not imply statistical independence of the set of random variables.
One obvious direction for further research is extending our impossibility results to quantitative accuracy-fairness trade-offs and bounds on what a data representation can guarantee over multiple tasks as a function of appropriate measures of task similarities.
-  Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. CoRR, abs/1707.00075, 2017.
-  Elliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, and Richard Zemel. Flexibly fair representation learning by disentanglement. In ICML, 2019.
-  Jennifer L Doleac and Benjamin Hansen. Does “ban the box” help or hurt low-skilled workers? statistical discrimination and employment outcomes when criminal histories are hidden. Technical report, National Bureau of Economic Research, 2016.
-  Flávio du Pin Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R. Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems 30, pages 3992–4001, 2017.
-  Harrison Edwards and Amos J. Storkey. Censoring representations with an adversary. In ICLR, 2016.
Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller.
Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning.In AAAI, 2018.
Moritz Hardt, Eric Price, and Nathan Srebro.
Equality of opportunity in supervised learning.In NIPS, 2016.
-  Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. CoRR, abs/1609.05807, 2016.
-  David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In ICML, 2018.
-  Daniel McNamara, Cheng Soon Ong, and Robert C Williamson. Costs and benefits of fair representation learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 263–270, 2019.
-  Luca Oneto, Michele Donini, Andreas Maurer, and Massimiliano Pontil. Learning fair and transferable representations. arXiv preprint arXiv:1906.10673, 2019.
Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano
Learning controllable fair representations.
The 22nd International Conference on Artificial Intelligence and Statistics, pages 2164–2173, 2019.
-  Latanya Sweeney. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst., 10(5):557–570, 2002.
-  Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In ICML, 2013.
-  Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In AAAI/ACM Conference on AI, Ethics, and Society, 2018.
Additional remarks on Theorem 2
Theorem 2 stated that every feature fulfilling some non-triviality requirements, there exists a distribution and feature sets and such that adding to either of the feature sets has opposing effects on the accuracy-driven fairness of the respective representations. We will now state a condition on and for this phenomenon to occur. It will be easy to see that this condition is fulfilled for a very general class of distributions and features, only excluding pathological examples.
In the following let denote a label and a group. The opposing label and group will be denoted by and respectively. A pair of a feature and a distribution is called generic if there exist sets with the following properties.
and are separated by the feature , i.e. there are such that and
and are label-homogeneous for different labels and is group homogeneous, i.e. and .
is not split by the feature, i.e. there is such that
has the same majority label as , i.e.
The fraction of elements of group and label in is sufficiently big in comparison to , i.e. .
For every pair generic feature-distribution pair , there are two feature sets and
The accuracy-driven fairness w.r.t , and of is greater than that of , i.e.
Thus, deleting in this context will increase unfairness.
The accuracy-driven fairness w.r.t , and of is less than that of , i.e.
Thus, deleting in this context will decrease unfairness.
Proof: We define as a representation which separates everything but a cell by labels. For such a representation enables perfect accuracy and therefore perfect fairness. However is constructed in a way such that thresholding at leads to unfair classification, as only elements of are misclassified. Furthermore we can define as a representation that separates all but two cells and perfectly by labels. As the only misclassification of Bayes classifier occurs on and it labels it has unfairness . Furthermore the only misclassification for the Bayes classifier occurs on and which are both labeled as , yielding the unfairness it has unfairness . As , by property (6.) of Definition 8, we thus get , concluding our proof.
we wil now see how the non-triviality criteria for a feature from Theorem 2 imply the existence of a generic pair .
For every non-committing, -anonymous feature , there exists a distribution , such that the pair is generic.
Proof: We need to show that it is possible to define three sets and a distribution such that the requirements of Definition 8 are fulfilled. From the fact that is non-committing we know that there are such that none of the subsets and is empty for any . We can thus define the non-empty set . Furthermore, we know that is also -anonymous and thus we can split further into two non-empty subsets and . Furthermore, we can define and as disjoint non-empty subsets of , such that and such that for any . Thus the properties (2.), (3.) and (4.) of the non-generic definition are fulfilled for the sets .
We can now choose to pick probability weights as follows:
Clearly (1.) is fulfilled as . Furthermore (5.) is fulfilled as, . Lastly, (6.) is fulfilled as:
Proof of Theorem 3:
We note that . Thus any , proving the inequality for adversarial fairness.
For any distribution and feature we can choose a representation such that