Inherent Tradeoffs in Learning Fair Representation

by   Han Zhao, et al.
Carnegie Mellon University

With the prevalence of machine learning in high-stakes applications, especially the ones regulated by anti-discrimination laws or societal norms, it is crucial to ensure that the predictive models do not propagate any existing bias or discrimination. Due to the ability of deep neural nets to learn rich representations, recent advances in algorithmic fairness have focused on learning fair representations with adversarial techniques to reduce bias in data while preserving utility simultaneously. In this paper, through the lens of information theory, we provide the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups. Specifically, when the base rates differ between groups, we show that any method aiming to learn fair representation admits an information-theoretic lower bound on the joint error across these groups. To complement our negative results, we also prove that if the optimal decision functions across different groups are close, then learning fair representation leads to an alternative notion of fairness, known as the accuracy parity, which states that the error rates are close between groups. Our theoretical findings are also confirmed empirically on real-world datasets. We believe our insights contribute to better understanding of the tradeoff between utility and different notions of fairness.



There are no comments yet.


page 1

page 2

page 3

page 4


Costs and Benefits of Wasserstein Fair Regression

Real-world applications of machine learning tools in high-stakes domains...

Conditional Learning of Fair Representations

We propose a novel algorithm for learning fair representations that can ...

On Fairness and Calibration

The machine learning community has become increasingly concerned with th...

A survey of bias in Machine Learning through the prism of Statistical Parity for the Adult Data Set

Applications based on Machine Learning models have now become an indispe...

Fundamental Limits and Tradeoffs in Invariant Representation Learning

Many machine learning applications involve learning representations that...

Renyi Fair Information Bottleneck for Image Classification

We develop a novel method for ensuring fairness in machine learning whic...

FaiR-N: Fair and Robust Neural Networks for Structured Data

Fairness in machine learning is crucial when individuals are subject to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the prevalence of machine learning applications in high-stakes domains, e.g., criminal judgement, medical testing, online advertising, etc., it is crucial to ensure that the automated decision making systems do not propagate existing bias or discrimination that might exist in historical data [32, 3, 5]. Among many recent proposals for achieving different notions of algorithmic fairness [37, 13, 35, 17, 36]

, learning fair representations has received increasing attention due to recent advances in learning rich representation with deep neural networks 

[14, 27, 29, 38, 6, 34]. In fact, a line of work has proposed to learn group-invariant representations with adversarial learning techniques in order to achieve statistical parity, also known as the demographic parity in the literature. This line of work dates at least back to Zemel et al. [37] where the authors proposed to learn predictive models that are independent of the group membership attribute. At a high level, the underlying idea is that if representations of instances from different groups are similar to each other, then the follow-up predictive models will certainly make decisions independent of group membership.

On the other hand, it has long been observed that there is an underlying tradeoff between utility and demographic parity: “All methods have in common that to some extent accuracy must be traded-off for lowering the dependency.” [8] In particular, it is easy to see that in an extreme case where the group membership coincides with the target task, a call for exact demographic parity will inevitably remove the perfect predictor [17]. Empirically, it has also been observed that a tradeoff exists between accuracy and fairness in binary classification [40]. Clearly, methods based on learning fair representations are also bound by such inherent tradeoff between utility and fairness. But how does the fairness constraint trade for utility? Will learning fair representations help to achieve other notions of fairness besides the demographic parity? If yes, what is the fundamental limit of utility that we can hope to achieve under such constraint?

To answer the above questions, through the lens of information theory, in this paper we provide the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups. Specifically, when the base rates differ between groups, we provide a tight information-theoretic lower bound on the joint error across these groups. Our lower bound is algorithm-independent so it holds for all methods aiming to learn fair representations. When only approximate demographic parity is achieved, we also present a family of lower bounds to quantify the tradeoff of utility introduced by such approximate constraint. As a side contribution, our proof technique is simple but general, and we expect it to have broader applications in other learning problems using adversarial techniques, e.g., unsupervised domain adaptation [15, 39] and privacy-preservation under attribute inference attacks [16].

To complement our negative results, we show that if the optimal decision functions across different groups are close, then learning fair representation helps to achieve an alternative notion of fairness, i.e., the accuracy parity, which states that the error rates are close between groups. Empirically, we conduct experiments on a real-world dataset that corroborate both our positive and negative results. We believe our theoretical insights contribute to better understanding of the tradeoff between utility and different notions of fairness, and they are also helpful in guiding the future design of representation learning algorithms to achieve algorithmic fairness.

2 Preliminary

We first introduce the notations used throughout the paper and formally describe the problem setup. We then briefly discuss some information-theoretic concepts that will be used in our analysis.


We use and to denote the input and output space. Accordingly, we use and

to denote the random variables which take values in

and , respectively. Lower case letters and are used to denote the instantiation of and . To simplify the presentation, we use as the sensitive attribute, e.g., race, gender, etc. 111Our main results could also be straightforwardly extended to the setting where

is a categorical variable.


be the hypothesis class of classifiers. In other words, for

, is the predictor that outputs a prediction. Note that even the predictor does not explicitly take the sensitive attribute as input, this fairness through blindness mechanism can still be biased due to the potential correlations between and

. In this work we study the stochastic setting where there is a joint distribution

over and from which the data are sampled. To keep the notation consistent, for , we use to mean the conditional distribution of given . For an event ,

denotes the probability of

under .

Problem Setup

Given a joint distribution , the error of a predictor under is defined as . Note that for binary classification problems, when , reduces to the true error rate of binary classification. To make the notation more compact, we may drop the subscript when it is clear from the context. In this work we focus on group fairness where the group membership is given by the sensitive attribute . Even in this context there are many possible definitions of fairness [31], and in what follows we provide a brief review of the ones that are mostly relevant to this work.

Definition 2.1 (Demographic Parity).

Given a joint distribution , a classifier satisfies demographic parity if is independent of .

When is a deterministic classifier, demographic parity reduces to the requirement that , i.e., positive outcome is given to the two groups at the same rate. Demographic parity is also known as statistical parity, and it has been adopted as definition of fairness in a series of work [7, 8, 14, 18, 19, 20, 27, 37, 29]. However, as we shall quantify precisely in Section 3, demographic parity may cripple the utility that we hope to achieve, especially in the common scenario where the base rates differ between two groups, e.g.,  [17]. In light of this, an alternative definition is accuracy parity:

Definition 2.2 (Accuracy Parity).

Given a joint distribution , a classifier satisfies accuracy parity if .

In the literature, a break of accuracy parity is also known as disparate mistreatment [36]. Again, when is a deterministic binary classifier, accuracy parity reduces to . Different from demographic parity, the definition of accuracy parity does not eliminate the perfect predictor when when the base rates differ between two groups. When costs of different error types matter, more refined definitions exist:

Definition 2.3 (Positive Rate Parity).

Given a joint distribution , a deterministic classifier satisfies positive rate parity if , .

Positive rate parity is also known as

equalized odds

 [17], which essentially requires equal true positive and false positive rates between different groups. Furthermore, Hardt et al. [17] also defined true positive parity, or equal opportunity, to be when positive outcome is desirable. Last but not least, predictive value parity, also known as test fairness [9], asks for equal chance of positive outcomes across groups given predictions:

Definition 2.4 (Predictive Value Parity).

Given a joint distribution , a probabilistic classifier satisfies predictive value parity if , .

When is a deterministic binary classifier that only takes value in , Chouldechova [9] showed an intrinsic tradeoff between predictive value parity and positive rate parity:

Theorem 2.1 (Chouldechova [9]).

Assume , then for any deterministic classifier that is not perfect, i.e., , positive rate parity and predictive value parity cannot hold simultaneously.

Similar tradeoff result for probabilistic classifier has also been observed by Kleinberg et al. [24], where the authors showed that for any non-perfect predictors, calibration and positive rate parity cannot be achieved simultaneously if the base rates are different across groups. Here a classifier is said to be calibrated if , i.e., if we look at the set of data that receive a predicted probability of by , we would like -fraction of them to be positive instances according to  [33].


Introduced by Ali and Silvey [2] and Csiszár [11, 12],

-divergence, also known as the Ali-Silvey distance, is a general class of statistical divergences to measure the difference between two probability distributions

and over the same measurable space.

Definition 2.5 (-divergence).

Let and be two probability distributions over the same space and assume is absolutely continuous w.r.t.  (). Then for any convex function that is strictly convex at 1 and , the -divergence of from is defined as


The function is called the generator function of .

Different choices of the generator function recover popular statistical divergence as special cases, e.g., the KL-divergence. From Jensen’s inequality it is easy to verify that and iff almost surely. Note that -divergence does not necessarily leads to a distance metric, and it is not symmetric in general, i.e., provided that and . We list some common choices of the generator function and their corresponding properties in Table 1. Notably, Khosravifard et al. [22] proved that total variation is the only -divergence that serves as a metric, i.e., satisfying the triangle inequality.

Name Generator Symm. Tri.
Squared Hellinger
Total Variation
Table 1: List of different -divergences and their corresponding properties. denotes the KL-divergence of from and is the average distribution of and . Symm. stands for Symmetric and Tri. stands for Triangle Inequality.

3 Theoretical Analysis

As we have briefly mentioned in Section 2, it is impossible to have imperfect predictor that is both calibrated and preserves positive rate parity when the base rates differ between two groups. Similar impossibility result also holds between positive rate parity and predictive value parity. On the other hand, while it has long been observed that demographic parity may eliminate perfect predictor [17], and previous work has empirically verified a tradeoff between the accuracy and demographic parity [8, 19, 40] on various datasets, so far a quantitative characterization on the exact tradeoff between accuracy and various notions of parity is still missing. In what follows we shall prove a family of information theoretic lower bounds on the accuracy that hold for all the methods. Due to space limit, we defer most of the proofs to appendix, while only leaving one to showcase the high-level idea of our proof technique.

3.1 Tradeoff between Accuracy and Demographic Parity

Essentially, every prediction function induces a Markov chain:

, where is the feature transformation, is the classifier on feature space, is the feature and is the predicted target variable by . Note that simple models, e.g., linear classifiers, are also included by specifying to be the identity map. With this notation, we first state the following theorem that quantifies an information-theoretic lower bound on the joint error across different groups:

Theorem 3.1.

Let be the predictor. If satisfies demographic parity, then .


First of all, essentially measures the discrepancy of base rates across groups, and achieves its maximum value of 1 iff almost surely, i.e., indicates group membership. Second, Theorem 3.1 applies to all possible feature transformation and predictor . In particular, if we choose to be the identity map, then Theorem 3.1 says that when the base rates differ, no algorithm can achieve a small joint error on both groups, and it also recovers the previous observation that demographic parity can eliminate the perfect predictor [17]. Third, the lower bound in Theorem 3.1 is insensitive to the marginal distribution of , i.e., it treats the errors from both groups equally. As a comparison, let , then . In this case could still be small even if the minority group suffers a large error.

Before we give the proof, we first present a useful lemma that lower bounds the prediction error by the total variation distance.

Lemma 3.1.

Let be the predictor, then for , .


For , we have:


Now we are ready to prove Theorem 3.1:

Proof of Theorem 3.1.

First of all, we show that if satisfies demographic parity, then:

where the last equality follows from the definition of demographic parity. Now from Table 1, is symmetric and satisfies the triangle inequality, we have:


The last step is to bound in terms of for using Lemma 3.1:

Combining the above two inequalities and (3) completes the proof. ∎

It is not hard to show that our lower bound in Theorem 3.1 is tight. To see this, consider the case , where the lower bound achieves its maximum value . Now consider a constant predictor or , which clearly satisfies demographic parity by definition. But in this case either or , hence , achieving the lower bound.

To conclude this subsection, we point out that the choice of total variation in the lower bound is not unique. As we will see shortly, similar lower bounds could be attained using specific choices of the general -divergence with some desired properties.

3.2 Tradeoff in Adversarial Representation Learning

Theorem 3.1 is an impossibility result on achieving accuracy and exact demographic parity jointly. But what if we only aim to achieve approximate demographic parity? What is the tradeoff between demographic parity and accuracy in this scenario? In fact, from the perspective of representation learning, recent work [14, 6, 38, 29] have proposed to learn intermediate feature through deep neural networks, aiming to maintain task-relevant information while at the same time removing sensitive information related to . To study the tradeoff in this setting, we introduce a relaxed version of total variation, known as the -divergence [4]:

Definition 3.1 (-divergence).

Let be a hypothesis class on feature space , and be the collection of subsets of that are the support of some hypothesis in , i.e., . The distance between two distributions and over based on is: .

-divergence is particularly favorable in the analysis of adversarial representation learning with binary classification problems, and it had also been generalized to the discrepancy distance [10, 30]

for general loss functions. When

has a finite VC-dimension,

-divergence can be estimated using finite unlabeled samples from

and  [23]. From an algorithmic viewpoint, -divergence admits a natural interpretation that

corresponds to the minimum sum of Type-I and Type-II error in distinguishing

and . To see this, realize


The second equality holds because for , we also have . In (3.2) the hypothesis acts as a discriminator trying to distinguish between and . The above probabilistic interpretation exactly serves as the theoretical justification of recent work on using adversarial training to learn group-invariant representation through transformation such that is small, where is the induced distribution of under . The following proposition exactly characterizes an intrinsic tradeoff of these methods:

Proposition 3.1.

Let be the predictor and . Then .


First, it is easy to show that and any distributions and , the following triangle inequality holds:

Again, we apply a chain of triangle inequalities as we did in the proof of Theorem 3.1:

The second inequality follows from the definition of -divergence and the last one is due to (2). Arranging the terms on both sides completes the proof. ∎


As we show above, is the minimum sum of Type-I and Type-II errors in discriminating and using discriminators from . Hence if the optimal discriminator from fails to distinguish between and , i.e., larger , the lower bound on the joint error across different groups will also get larger.

In fact, a close scrutiny of the proof above shows that the lower bound in Proposition 3.1 holds even if different transformation functions are used on the corresponding groups:

Corollary 3.1.

Let be the predictors for group and as defined in Proposition 3.1. Then .

One interesting fact implied by Proposition 3.1 is that the lower bound of the joint error across groups scales linearly with , the optimal sum of Type-I and Type-II errors in distinguishing between and . In the work of Zhang et al. [38], the authors proposed a model (Fig. 1 in [38]) that precisely tries to maximize by learning the model parameters of through adversarial techniques. In this case our lower bound directly quantifies the loss of utility due to the increase of .

3.3 A Family of Information-Theoretic Lower Bounds

In the last subsection we show that any adversarial discriminator that tries to distinguish between and by taking the predicted target variable as input admits an inherent lower bound in terms of joint target error. This is the algorithm proposed by Zhang et al. [38] for mitigating biases. As a comparison, most other variants [14, 27, 6, 1]

build an adversarial discriminator that takes as input the feature vector

instead. In this subsection we generalize our previous analysis with -divergence to prove a family of lower bounds on the joint target prediction error for the latter variants. Based on our theoretical analysis, we conclude that matching the distributions from different groups within the feature space does not remove the tradeoff. In fact, a family of lower bounds also exist for these approaches.

We require one last piece of ingredient before we state and prove the main results in this section. The following lemma is proved by Liese and Vajda [25] as a generalization of the data processing inequality for -divergence:

Lemma 3.2 (Liese and Vajda [25]).

Let be the space of all probability distributions over . Then for any -divergence , any stochastic kernel , and any distributions and over , .

Roughly speaking, Lemma 3.2 says that data processing cannot increase discriminating information. Define and . Both and form a bounded distance metric over the space of probability distributions. Realize that , and are all -divergence. The following corollary holds:

Corollary 3.2.

Let to any (randomized) hypothesis, and be the induced distribution of by , . Let be the predictor, then 1). . 2). . 3). .

Now we are ready to present the following main theorem of this subsection:

Theorem 3.2.

Let be the predictor. Assume and , then the following three inequalities hold:

  1. Total variation lower bound:

  2. Jensen-Shannon lower bound:

  3. Hellinger lower bound:


We prove the three inequalities respectively. The total variation lower bound follows the same idea as the proof of Theorem 3.1 and the inequality from Corollary 3.2. To prove the Jensen-Shannon lower bound, realize that is a distance metric over probability distributions. Combining with the inequality from Corollary 3.2, we have:

Now by Lin’s lemma [26, Theorem 3], for any two distributions and , we have . Combine Lin’s lemma with Lemma 3.1, we get the following lower bound:

Apply a simple AM-GM inequality, we can further bound the L.H.S. by

Under the assumption that , taking the square at both sides then completes the proof for the second inequality. The proof for Hellinger’s lower bound follows exactly as the one for Jensen-Shannon’s lower bound, except that we need to use , , instead of Lin’s lemma. ∎


All the three lower bounds in Theorem 3.2 imply a tradeoff between demographic parity and the joint error across groups through learning group-invariant feature representations. When , which also implies , all three lower bounds get larger, in this case we have

and this reduces to the tight lower bound in Theorem 3.1.

3.4 Group-Invariant Representation Leads to Accuracy Parity

In previous subsections we prove a family of information-theoretic lower bounds that demonstrate an inherent tradeoff between demographic parity and joint error across groups. Specifically, we show that group-invariant representation will also inevitably compromise utility. A natural question to ask then, is, what kind of parity can group-invariant representation bring us? To complement our negative results, in this subsection we show that learning group-invariant representation helps to reduce discrepancy of errors (utilities) across groups.

First of all, since we work under the stochastic setting where is a joint distribution over and conditioned on , then any function mapping will inevitably incur an error due to the noise existed in the distribution . Formally, for , define the optimal function under the absolute error to be , where denotes the median of given under distribution . Now define the noise of distribution to be . With these notations, we are now ready to present the following theorem:

Theorem 3.3.

For any hypothesis , the following inequality holds:


First, we show that for , cannot be too large if is close to :

Next, we bound by:

To simplify the notation, define so that

To bound , realize that . On one hand, we have:

where the last inequality is due to . Similarly, by subtracting and adding back instead, we can also show that . Combining all the inequalities above finishes the proof. ∎


Theorem 3.3 upper bounds the discrepancy of accuracy across groups by three terms: the noise, the distance of representation across groups and the discrepancy of optimal decision functions. In an ideal setting where both distributions are noiseless, i.e., same people in the same group are always treated equally, the upper bound simplifies to the latter two terms. If we further require that the optimal decision functions and are close to each other, i.e., optimal decisions are insensitive to the group membership, then Theorem 3.3 implies that a sufficient condition to guarantee accuracy parity is to find group-invariant representation that minimizes .

4 Experiments

Our theoretical results on the lower bound between demographic parity and the sum of joint error across groups imply that over-training the feature transformation function to achieve group-invariant representation will lead to large joint errors. On the other hand, our upper bound also implies that group-invariant representation help to achieve accuracy parity. To verify these theoretical implications, in this section we conduct experiments on a real-world benchmark dataset, the UCI Adult dataset222, to present empirical results with various metrics.


The Adult dataset contains 30,162/15,060 training/test instances for income prediction. Each instance in the dataset describes an adult from the 1994 US Census. Attributes include gender, education level, age, etc. In this experiment we use gender (binary) as the sensitive attribute, and we preprocess the dataset to convert categorical variables into one-hot representations. The processed data contains 114 attributes. The target variable (income) is also binary: 1 if 50K/year otherwise 0. For the sensitive attribute , means Male otherwise Female. In this dataset, the base rates across groups are different: while . Also, the group ratios are different: .

Network Architecture

To validate the effect of learning group-invariant representation with adversarial debiasing techniques [38, 29, 6]

, we perform a controlled experiment by fixing the baseline network architecture to be a three hidden-layer feed-forward network with ReLU activations. The number of units in each hidden layer are 500, 200, and 100, respectively. The output layer corresponds to a logistic regression model. This baseline without debiasing is denoted as NoDebias. For debiasing with adversarial learning techniques, the adversarial discriminator network takes the feature from the last hidden layer as input, and connects it to a hidden-layer with 50 units, followed by a binary classifier whose goal is to predict the sensitive attribute

. This model is denoted as AdvDebias. Compared with NoDebias, the only difference of AdvDebias in terms of objective function is that besides the cross-entropy loss for target prediction, the AdvDebias also contains a classification loss from the adversarial discriminator to predict the sensitive attribute

. In the experiment, all the other factors are fixed to be the same for these two methods, including learning rate, optimization algorithm, training epoch, and also batch size. To see how the adversarial loss affects the joint error, the demographic parity as well as the accuracy parity, we vary the coefficient

for the adversarial loss between 0.1, 1.0 and 5.0.

The experimental results are listed in Table 2. Note that in the table could be understood as measuring an approximate version of accuracy parity, and similarly measures the closeness of the prediction function to demographic parity. From the table, it is then clear that with increasing , both the overall error (sensitive to the marginal distribution of ) and the joint error (insensitive to the imbalance of ) are increasing. As expected, is drastically decreasing with the increasing of . Furthermore, is also gradually decreasing, but much slowly than . This is due to the existing noise in the data as well as the shift between the optimal decision functions across groups, as indicated by our upper bound in Theorem 3.3. To conclude, all the empirical results are consistent with our theoretical findings.

NoDebias 0.157 0.275 0.115 0.189
AdvDebias, 0.159 0.278 0.116 0.190
AdvDebias, 0.162 0.286 0.106 0.113
AdvDebias, 0.166 0.295 0.106 0.032
Table 2: Adversarial debiasing on demographic parity, joint error across groups, and accuracy parity.

5 Related Work

Fairness Frameworks

There is a broad literature on fairness, notably in social choice theory, ethics, economics and machine learning. Two central notions of fairness have been extensively studied, i.e., group fairness and individual fairness. In a seminal work, Dwork et al. [13]

define individual fairness as a measure of smoothness of the classification function. Under the assumption that number of individuals is finite, the authors proposed a linear programming framework to maximize the utility under their fairness constraint. However, their framework requires apriori a distance function that computes the similarity between individuals, and their optimization formulation does not produce an inductive rule to generalize to unseen data. Based on the definition of positive rate parity,

Hardt et al. [17] proposed a post-processing method to achieve fairness by taking as input the prediction and the sensitive attribute. In a concurrent work, Kleinberg et al. [24] offer a calibration technique to achieve fairness as well. However, both of the aforementioned two approaches require sensitive attribute during the inference phase, which is not available in many real-world scenarios.

Regularization Techniques

The line of work on fairness-aware learning through regularization dates at least back to Kamishima et al. [21], where the authors argue that simple deletion of sensitive features in data is insufficient for eliminating biases in automated decision making, due to the possible correlations among attributes and sensitive information [28]. In light of this, the authors proposed a prejudice remover regularizer that essentially penalizes the mutual information between the predicted goal and the sensitive information. In a more recent approach, Zafar et al. [35]

leveraged a measure of decision boundary fairness and incorporated it via constraints into the objective function of logistic regression as well as support vector machines. As discussed in Section 

2, both approaches essentially reduce to achieving demographic parity through regularization.

Representation Learning

In a pioneer work, Zemel et al. [37] proposed to preserve both group and individual fairness through the lens of representation learning, where the main idea is to find a good representation of the data with two competing goals: to encode the data for utility maximization while at the same time to obfuscate any information about membership in the protected group. Due to the power of learning rich representations offered by deep neural nets, recent advances in building fair automated decision making systems focus on using adversarial techniques to learn fair representation that also preserves enough information for the prediction vendor to achieve his utility [14, 27, 6, 38, 1, 34]. Madras et al. [29]

further extended this approach by incorporating reconstruction loss given by an autoencoder into the objective function to preserve demographic parity, equalized odds, and equal opportunity.

6 Conclusion

In this paper we theoretically and empirically study the important problem of quantifying the tradeoff between utility and fairness in learning group-invariant representations. Specifically, we prove a novel lower bound to characterize the tradeoff between demographic parity and the joint utility across different population groups when the base rates differ between groups. In particular, our results imply that any method aiming to learn fair representation admits an information-theoretic lower bound on the joint error, and the better the representation, the larger the joint error. Complementary to our negative results, we also show that learning fair representation leads to accuracy parity if the optimal decision functions across different groups are close. These theoretical findings are also confirmed empirically on real-world datasets. We believe our results take an important step towards better understanding the tradeoff between utility and different notions of fairness. Inspired by our lower bound, one interesting direction for future work is to design instance-weighting algorithm to balance the base rates during representation learning.


  • Adel et al. [2019] Tameem Adel, Isabel Valera, Zoubin Ghahramani, and Adrian Weller. One-network adversarial fairness. In

    33rd AAAI Conference on Artificial Intelligence

    , 2019.
  • Ali and Silvey [1966] Syed Mumtaz Ali and Samuel D Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B (Methodological), 28(1):131–142, 1966.
  • Barocas and Selbst [2016] Solon Barocas and Andrew D Selbst. Big data’s disparate impact. Calif. L. Rev., 104:671, 2016.
  • Ben-David et al. [2007] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007.
  • Berk et al. [2018] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, page 0049124118782533, 2018.
  • Beutel et al. [2017] Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.
  • Calders and Verwer [2010] Toon Calders and Sicco Verwer.

    Three naive bayes approaches for discrimination-free classification.

    Data Mining and Knowledge Discovery, 21(2):277–292, 2010.
  • Calders et al. [2009] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13–18. IEEE, 2009.
  • Chouldechova [2017] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.
  • Cortes et al. [2008] Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. Sample selection bias correction theory. In International Conference on Algorithmic Learning Theory, pages 38–53. Springer, 2008.
  • Csiszár [1964] Imre Csiszár. Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten. Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8:85–108, 1964.
  • Csiszár [1967] Imre Csiszár. Information-type measures of difference of probability distributions and indirect observation. studia scientiarum Mathematicarum Hungarica, 2:229–318, 1967.
  • Dwork et al. [2012] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.
  • Edwards and Storkey [2015] Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015.
  • Ganin et al. [2016] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
  • Hamm [2017] Jihun Hamm. Minimax filter: Learning to preserve privacy from inference attacks. The Journal of Machine Learning Research, 18(1):4704–4734, 2017.
  • Hardt et al. [2016] Moritz Hardt, Eric Price, Nati Srebro, et al.

    Equality of opportunity in supervised learning.

    In Advances in neural information processing systems, pages 3315–3323, 2016.
  • Johndrow et al. [2019] James E Johndrow, Kristian Lum, et al. An algorithm for removing sensitive information: application to race-independent recidivism prediction. The Annals of Applied Statistics, 13(1):189–220, 2019.
  • Kamiran and Calders [2009] Faisal Kamiran and Toon Calders. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication, pages 1–6. IEEE, 2009.
  • Kamishima et al. [2011] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643–650. IEEE, 2011.
  • Kamishima et al. [2012] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35–50. Springer, 2012.
  • Khosravifard et al. [2007] Mohammadali Khosravifard, Dariush Fooladivanda, and T Aaron Gulliver. Confliction of the convexity and metric properties in f-divergences. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 90(9):1848–1853, 2007.
  • Kifer et al. [2004] Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 180–191. VLDB Endowment, 2004.
  • Kleinberg et al. [2016] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807, 2016.
  • Liese and Vajda [2006] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, 2006.
  • Lin [1991] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
  • Louizos et al. [2015] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv preprint arXiv:1511.00830, 2015.
  • Lum and Johndrow [2016] Kristian Lum and James Johndrow. A statistical framework for fair predictive algorithms. arXiv preprint arXiv:1610.08077, 2016.
  • Madras et al. [2018] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pages 3381–3390, 2018.
  • Mansour et al. [2009] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
  • Narayanan [2018] Arvind Narayanan. Translation tutorial: 21 fairness definitions and their politics. In Proc. Conf. Fairness Accountability Transp., New York, USA, 2018.
  • of the President [2016] Executive Office of the President. Big data: A report on algorithmic systems, opportunity, and civil rights. Executive Office of the President, 2016.
  • Pleiss et al. [2017] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680–5689, 2017.
  • Song et al. [2019] Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning controllable fair representations. In Artificial Intelligence and Statistics, pages 2164–2173, 2019.
  • Zafar et al. [2015] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259, 2015.
  • Zafar et al. [2017] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180. International World Wide Web Conferences Steering Committee, 2017.
  • Zemel et al. [2013] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
  • Zhang et al. [2018] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340. ACM, 2018.
  • Zhao et al. [2019] Han Zhao, Remi Tachet des Combes, Kun Zhang, and Geoffrey J Gordon. On learning invariant representation for domain adaptation. In International Conference on Machine Learning, 2019.
  • Zliobaite [2015] Indre Zliobaite. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv:1505.05723, 2015.