Transfer of Machine Learning Fairness across Domains

06/24/2019 ∙ by Candice Schumann, et al. ∙ Google University of Maryland 4

If our models are used in new or unexpected cases, do we know if they will make fair predictions? Previously, researchers developed ways to debias a model for a single problem domain. However, this is often not how models are trained and used in practice. For example, labels and demographics (sensitive attributes) are often hard to observe, resulting in auxiliary or synthetic data to be used for training, and proxies of the sensitive attribute to be used for evaluation of fairness. A model trained for one setting may be picked up and used in many others, particularly as is common with pre-training and cloud APIs. Despite the pervasiveness of these complexities, remarkably little work in the fairness literature has theoretically examined these issues. We frame all of these settings as domain adaptation problems: how can we use what we have learned in a source domain to debias in a new target domain, without directly debiasing on the target domain as if it is a completely new problem? We offer new theoretical guarantees of improving fairness across domains, and offer a modeling approach to transfer to data-sparse target domains. We give empirical results validating the theory and showing that these modeling approaches can improve fairness metrics with less data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Much of machine learning research, and especially machine learning fairness, focuses on optimizing a model for a single use case Agarwal et al. (2018); Beutel et al. (2017). However, the reality of machine learning applications is far more chaotic. It is common for models to be used on multiple tasks, frequently different in a myriad of ways from the dataset that they were trained on, often coming at significant cost Sculley et al. (2015). This is especially concerning for machine learning fairness – we want our models to obey strict fairness properties, but we may have far less data on how the models will actually be used. How do we understand our fairness metrics in these more complex environments?

In traditional machine learning, domain adaptation techniques are used when the distribution of training and validation data does not match the target distribution that the model will ultimately be tested against. Therefore, in this paper we ask: if the model is trained to be “fair” on one dataset, will it be “fair” over a different distribution of data? Instead of starting again with this new dataset, can we use the knowledge gained during the original debiasing to more effectively debias in the new space?

It turns out that this framing covers many important cases for machine learning fairness. We will use, as a running example, the task of income prediction, where some decisions will be made based on the person’s predicted income and we want the model to perform “fairly” over a sensitive attribute such as gender. We primarily follow the equality of opportunity Hardt et al. (2016) perspective where we are concerned with one group (broken down by gender or race) having worse accuracy than another. In this setting, there are a myriad of fairness issues that arise that we find domain adaptation can shed light on:

Lacking sensitive features for training: There may be few examples where we know the sensitive attribute. In these cases, a proxy of the sensitive attribute have been used Gupta et al. (2018), or researchers need very sample-efficient techniques Agarwal et al. (2018); Beutel et al. (2017). For distant proxies, researchers have asked how well fairness transfers across attributes Lan and Huan (2017). Here the sensitive attribute differs in the source and target domains.

Data is not representative of application: Dataset augmentation, models offered as an API, or models used in multiple unanticipated settings, are all increasingly common design patterns. Even for machine learning fairness, researchers often believe limited training data is a primary source of fairness issues Chen et al. (2018) and will employ dataset augmentation techniques to try to improve fairness Dixon et al. (2018). How can we best make use of auxiliary data during training and evaluation when it differs in distribution from the real application?

Multiple tasks: In some cases having accurate labels for model training is difficult and instead proxy tasks with more labeled data are used to train the model, e.g., using pre-trained image or text models or using income brackets as a proxy for defaulting on a loan. Again we ask: when does satisfying a fairness property on the original task help satisfy that same property on the new task?

Each of these cases are common throughout machine learning but present challenges for fairness. In this work, we explore mapping domain adaptation principles to machine learning fairness. In particular, we offer the following contributions:

  1. Theoretical Bounds:

    We provide theoretical bounds on transferring equality of opportunity and equality of odds metrics across domains. Perhaps more importantly, we discuss insights gained from these bounds.

  2. Modeling for Fairness Transfer: We offer a general, theoretically-backed modeling objective that enables transferring fairness across domains.

  3. Empirical validation: We demonstrate when transferring machine learning fairness works successfully, and when it does not, through both synthetic and realistic experiments.

2 Related Work

This work lies at the intersection of traditional domain adaptation and recent work on ML fairness.

Domain Adaptation

Both Pan et al. (2010), and Weiss et al. (2016)

provide a survey on current work in transfer learning. One case of transfer learning is domain adaptation, where the task remains the same, but the distribution of features that the model is trained on (the source domain) does not match the distribution that the model is tested against (the target domain).

Ben-David et al. (2007) provide theoretical analysis of domain adaptation. Ben-David et al. (2010) extend this analysis to provide a theoretical understanding of how much source and target data should be used to successfully transfer knowledge. Mansour et al. (2009) provide theoretical bounds on domain adaptation using Rademacher Complexity analysis. In later research, Ganin et al. (2016) build on this theory to use an adversarial training procedure over latent representations to improve domain adaptation.

Fairness in Machine Learning

A large thread of recent research has studied how to optimize for fairness metrics during model training. Li et al. (2018) empirically show that adversarial learning helps preserve privacy over sensitive attributes. Beutel et al. (2017) focus on using adversarial learning to optimize different fairness metrics, and Madras et al. (2018) provides a theoretical framework for understanding how adversarial learning optimizes these fairness goals. Zhang et al. (2018)

use adversarial training over logits rather than hidden representations. Other work has focused on constraint-based optimization of fairness objectives

Goh et al. (2016); Agarwal et al. (2018). Tsipras et al. (2018) however, provide a theoretical bound on the accuracy of adversarial robust models. They show that even with infinite data there will still be a trade-off of accuracy for robustness.

Domain Adaptation & Fairness

Despite the prevalence of using one model across multiple domains, in practice little work has studied domain adaptation and transfer learning of fairness metrics. Kallus and Zhou (2018) use covariate shift correction when computing fairness metrics to address bias in label collection. More related, Madras et al. (2018) show empirically that their method allows for fair transfer. The transfer learning here corresponds to preserving fairness for a single sensitive attribute but over different tasks. However, Lan and Huan (2017) found empirically that fairness does not transfer well to a new domain. They found that as accuracy increased in the transfer process, fairness decreases in the new domain. It is concerning that these papers show opposing effects. Both of these papers offer empirical results on the UCI adult dataset, but neither provide a theoretical understanding of how and when fairness in one domain transfers to another.

3 Problem Formulation

We begin with some notation to make precise the problem formulation. Building on our running example we have two domains: a source domain , which is a feature distribution influenced by sensitive attribute (e.g., ), as well as a target domain influenced by sensitive attribute (e.g., ). In order for this to be a domain adaptation problem, we assume . Note, this can be true even if but the distributions conditioned on and differ. We focus on binary classification tasks with label

, e.g. income classification is shared over both domains. For this task we can create a classifier by finding a hypothesis

from a hypothesis space .

Let us assume that we can learn a “fair” classifier for the source domain and task. If we use a small amount of data from the target domain, will the fairness from the source sensitive attribute transfer to the target domain and sensitive attribute ? We can define the notion of a “fairness” distance – how far away the classifier is from perfectly fair – in a given domain as . Within this formulation we consider two definitions of fairness.

The first distance is equality of opportunity Hardt et al. (2016). A classifier is said to be fair under equality of opportunity if the false positive rates (FPR) over sensitive attributes are equal. In other words if we have a binary sensitive attribute , then equality of opportunity requires that , where gives the outcome of classifier . Thus, how far away a classifier is from equal opportunity (or the fairness distance of equal opportunity) can be defined as

where . In our running example , where is gender, is the difference between the likelihood that a low-income man is predicted to be high-income and the likelihood that a low-income woman is predicted to be high-income. A symmetric definition and set of analysis can be made for false negative rate (FNR).

The second definition of fairness which we consider is equalized odds Hardt et al. (2016). A classifier is said to be fair under equalized odds if both the FPR and FNR over the sensitive attribute are equal: Similar to equal opportunity, we define the fairness distance of equalized odds as:

Again using our running example, the distance of equalized odds in the source domain is given by the difference of expected FPRs between females and males (as above), plus the difference of expected FNRs (high-income predicted to be low-income) between females and males.

Given a classifier that has a fairness guarantee in the source domain, the fairness distance in the target domain should be bounded by the fairness distance in the source domain:


The key question we hope to answer is: what is ?

4 Bounds on Fairness in the Target Domain

To expand inequality (1) we need to start with some definitions. Given a hypothesis space and a true labeling function , we can define the error of a hypothesis as , the expectation of disagreement between the hypothesis and the true label . We can then define the ideal joint hypothesis that minimizes the combined error over both the source and target domains as .

Following Ben-David et al. (2010) we define the

-divergence between probability distributions as


where is the set for which

is the characteristic function (

). We can compute an approximation by finding a hypothesis that finds the largest difference between the samples from and  Ben-David et al. (2007). This divergence can be used to look at the differences in distributions, which is important when moving from a source domain to a target domain.

Additionally, we defined the symmetric difference hypothesis space as the set of hypotheses


where is the XOR function. The symmetric difference hypothesis space is used to find disagreements between a potential classifier and a true labeling function .

Theorem 1.

Let be a hypothesis space of VC dimension . If are samples of size , each drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by

where .

Using both the definition of -divergence and symmetric difference hypothesis space, Theorem 1 provides a VC-dimension bound on the equal opportunity distance in the target domain given the equal opportunity distance in the source domain. Due to space limitations, full proofs for all theorems can be found in Appendix B.

Figure 1: Both the source and target distributions can be split into four quadrants: 1) negative minority 2) negative majority 3) positive minority and 4) positive majority.

This theorem provides insights on when domain adaptation for fairness can be used. Firstly the terms in the bound suggest that 1) the source and target distributions of negatively labeled items that have a sensitive attribute label of 0 should be close, and 2) the source and target distributions of the negatively labeled items that have a sensitive attribute label of 1 should be close. In Figure 1 the red quadrants should be close to the red quadrants while the orange quadrants should be close to the orange quadrants across domains. In traditional domain adaptation, ignoring fairness, the entire domains should be close (the entire circle), which means that if there are few minority data-points then the distance of the minority spaces will be ignored. The fairness bound instead puts equal emphasis on both the majority and minority.

Secondly, the terms become small when the hypothesis space contains a function that has low error on both the source and target space on the two negative segments in each domain (the red and orange spaces in Figure 1). Since we are looking at equal opportunity, the function only needs to have low error on the negative space for both the majority and minority. Therefore, we can use the trivial function and the terms go to 0.

Lastly, Theorem 1 depends on the VC-dimension

. Since bounds with VC-dimensions explode with models like neural networks, we also provide bounds using Rademacher Complexity in Appendix 


Equalized odds, while similar to equal opportunity, is a stricter fairness constraint. Theorem 2 provides a VC-dimension bound on the difference of equal odds in the target domain given the source domain.

Theorem 2.

Let be a hypothesis space of VC dimension . If are samples of size , each drawn from for all and , then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equalized odds in the target space is bounded by

where , and .

The terms suggest, that in order for equalized odds to transfer successfully then, 1) the source and target distributions of negatively labeled items on both sensitive attribute labels 0 and 1 should be close, 2) the source and target distributions of the positively labeled items on both sensitive attribute labels 0 and 1 should be close. In other words, all four quadrants of the source should individually be close to the respective four quadrants of the target in Figure 1.

Additionally, the term shows that there should be a hypothesis that performs well over all of these subspaces. This implication is intuitive given that equalized odds, by definition, wants a classifier to perform well in both the negative and positive space across both groups.

5 Modeling to Transfer Fairness

Figure 2: At a high level, our general framework combines a primary training objective, a fairness objective, and a transfer objective to improve fairness goals in a target domain. Table 1 provides mathematical details for different configurations.

With this theoretical understanding, how should we change our training? As motivated previously, we consider the case where we have a small amount of labelled data (both labels and sensitive attributes ) in the target domain and a large amount of labelled data in the source domain.

As shown in the previous section, equality of opportunity will transfer if the distance between the respective distributions of source and target are close together as visually portrayed in Figure 1. Ganin et al. (2016) proved that traditional domain adaptation can be framed as minimizing the distance between source and target with adversarial training. Louizos et al. (2016); Edwards and Storkey (2016); Beutel et al. (2017); Li et al. (2018) similarly have applied adversarial training to achieve fairness goals, and Madras et al. (2018) proved that equality of odds can be optimized with adversarial training similar to domain adaptation.

We build on this intuition to design a learning objective for transferring equality of opportunity to a target domain. Adversarial training conceptually enables minimizing a term from Theorem 1; and can be optimized using Beutel et al. (2017); Madras et al. (2018) or one of the other myriad of traditional fairness learning objectives. As such, we begin with the following loss:



is the loss function training

over hidden representation to predict the task label . To optimize , tries to predict the sensitive attribute from the source and provides an adversarial loss that includes a negated gradient on following Beutel et al. (2017). For transfer, we minimize terms by including another adversarial loss , where tries to predict whether a sample comes from the source or target domain. Each of these loss components maps to terms in Theorem 1 as laid out in Table 1.

Loss Term Theorem 1 Adversarial (Eq. 4) Regularization (Eq. 5)
Fairness head
Transfer head
Table 1: Relationship between terms in Theorem 1 and Loss functions

Recently, Zhang et al. (2018) used adversarial training on a one dimensional representation of the data (effectively the model’s prediction). From this perspective, we can use a wide variety of losses over predictions to replace adversarial losses, such as Zafar et al. (2017); Beutel et al. (2019) minimizing the correlation between group and the one dimensional representation of the data. Like previous work, we find that these approaches to be more stable and still effective in comparison to adversarial training, despite not being provably optimal. In our experiments we use a MMD loss Gretton et al. (2012); Long et al. (2015); Bousmalis et al. (2016) over predictions:


where is the MMD regularization over the sensitive attributes in the source domain, is the MMD regularization over source/target membership. Again Table 1 maps the terms in Eq. 5 to those in Theorem 1.

Care must be taken when performing domain adaptation with regards to fairness. Either multiple transfer heads should be included in the loss for all necessary quadrants (See Figure 1 and Eq. 4), or balanced data – equally representing all necessary quadrants – should be used as in Madras et al. (2018) and Eq. 5. Experiments in this paper use the MMD regularization as in Eq. 5 and balanced data is used for both the fairness head as well as the transfer heads.

6 Experiments

To better understand the theoretical results presented above, we now present both synthetic and realistic experiments exploring tightness of our theoretical bound as well as the ability to improve the transfer of fairness across domains during model training.

6.1 Synthetic Examples

(a) Source
(b) Target -1
(c) Target 0
(d) Target 1
(e) Target Fairness
Figure 3: Synthetic examples showing how distribution difference of in the target domain affects theoretical and empirical equality of opportunity (best viewed in color). In the title of each plot we give the equal opportunity distance in the target domain.

We show how well the theoretical bounds align with actual transfer of fairness. A synthetic dataset is used to examine how the distribution distance terms and in Eq. (1) affect the fairness distance of equal opportunity .

In this synthetic example, we generate data

using Gaussian distributions. As we can see in Figure

2(a), the source domain consists of four Gaussians, with largely lying above and lying to the left of ; is the majority of the data ( with samples). For , the data is generated using with samples. The target domain, like the source domain, consists of majority data with and the data from is generated from the same distribution in both domains: and . However, in order to understand the transfer of fairness, we shift the distributions of and in the target domain ( for 2(b), 2(c) and 2(d), respectively). By varying the overlap between these distributions, and their alignment with the source data, we are able to understand the relationship between the terms above and the fairness distance of equal opportunity . For each setting, we train linear classifiers on the source domain and examine the performance in the target domain.

Qualitative Analysis

We see in Fig. 2(b) that when the distribution across domains is close, thus a smaller , there is better transfer of fairness the source to the target domain, seen in the smaller . As the distribution distance gets larger, the also increases. Consider the worst case of a sign flip for the minority , as shown in Fig. 2(d): the FPR for the majority is close to , while the FPR for the minority is close to .

Quantitative Analysis

In Figure 2(e), we compare the derived bound of (Eq. 1

) with its empirical estimate as we vary

111As in Ben-David et al. (2007), is estimated by a linear classifier trained on samples . The plot omits the VC term for simplicity, which is relatively small when sample size is large and VC-dimension is low.. As shown in Figure 2(e), the theoretical bound on the equal opportunity distance is close to the observed equal opportunity distance when the distance between the negative minority space across domains, , is small. This suggests, minimizing the domain distance terms in Eq. 1 could lead to a better equal opportunity transfer.

6.2 Real Data

(a) Effect of fairness head: Improving with varying number of gender-balanced samples.
(b) Some natural transfer occurring without explicit transfer: is improved with gender data.
(c) Effect of transfer head: better transfer from gender (1000 samples) to race (50 samples).
(d) Accuracy graph for transferring from gender (1000 samples) to race (50 samples).
Figure 4:

Effect of fairness/transfer head on the UCI data. The shaded areas show the standard error of the mean across trials. Note the head weight (x-axis) starts from


We now explore how and when our proposed modeling approach in Section 5 facilitates the transfer of fairness from the source to the target domain on two real-world datasets. Note, we use these datasets exclusively for understanding our theory and model, and not as a comment on when or if the proposed tasks and their application are appropriate, as in Agarwal et al. (2018).

Dataset 1: The UCI Adult222 dataset contains census information of over 40,000 adults from the 1994 Census, with the task of determining income brackets of or . We focus on two sensitive attributes: binary valued gender, and race, converted to binary values [‘white’, ‘non-white’] as done by Madras et al. (2018).

Dataset 2: As in Agarwal et al. (2018) we use ProPublica’s COMPAS recidivism data333

to try to predict recidivism for over 10,000 defendants based on age, gender, demographics, prior crime count, etc. We again focus on two sensitive attributes: gender and race (binarized to [‘white’, ‘non-white’]).

Experiment Setup

For both datasets, cross-validation is used to choose the hyper-parameters. Comparable baseline accuracy (around for Dataset 1 and for Dataset 2, see appendix D for more details) is achieved with embedding dimension for categorical features, single hidden layer with shared hidden units, batch size, learning rate with Adagrad optimizer, and epochs for training. We perform runs for each set of experiments and average over the results.

Sparsity Issues and Natural Transfer

We examine the effectiveness of just the fairness heads in the proposed model. The amount of gender-balanced data created for the fairness head is varied to observe how applying the fairness head affects the FPR difference.

We examine how this procedure effects the FPR difference across genders (i.e., the FPR difference between “Female” and “Male” examples). Figure 3(a) shows that the fairness head works as expected: with sufficient data and a large enough weight, the fairness head is able to improve the FPR gap across genders. Further, we find that with very few examples on which to apply the fairness head, the gender FPR gap does not close. This aligns with previous results found in Beutel et al. (2017); Madras et al. (2018); Beutel et al. (2019).

Second, we examine how running the fairness head on gender affects the FPR gap across race. As shown in Figure 3(b), there is a natural transfer of equal opportunity from gender to race – applying a fairness loss with respect to gender also improves the fairness of the model with respect to race. This highlights that sometimes there is a natural transfer of equal opportunity, presenting general value in improving the FPR gap with respect to gender, and no explicit transfer optimization is needed. (Similar to the transfer questions posed previously by Madras et al. (2018) and Gupta et al. (2018)).

Effectiveness of Transfer Head

We now explore how adding the transfer head can further improve equality of opportunity in the target domain. We compare four different model arrangements: (1) Source Only: We only add a fairness head for the source domain; (2) Target Only: We only add a fairness head for the target domain; (3) Source+Target: We add two fairness heads, one for source and for target; (4) Transfer: We include three heads – both source and target fairness heads as well as the transfer head for equality of opportunity.

Experiment setting: As in typical transfer learning setting, we will focus on the case where we observe a large number of samples in the source domain (e.g., 1000 for each race “white” and “non-white”), but a smaller sample size in the target domain (e.g., 100 for each gender “male” and “female”), and the same for gender to race. We explore equality of opportunity with respect to FPR in the target domain, as we vary the weight on the fairness and transfer heads.

Results: Figure 3(c) shows that including the transfer head results in a better equal opportunity transfer, compared to the same setting without transfer (Figure 3(b)). Table 2 summarizes the full results on both datasets. We can see that including both the fairness heads and the transfer head consistently gives the best improvement in equal opportunity (FPR difference) in almost all cases.

Effect of Target Sample Size

Last, we consider how the amount of data from the target domain affects our ability to improve equal opportunity there, as sample efficiency is a core challenge.

Experiment setting: We follow a similar experimental procedure as before with two modifications. First, we vary the number of samples we observe for each sensitive group in the target domain to be in . We examine the efficacy of the four approaches depending on the amount of data available for debiasing in the target domain. Second, this analysis is performed for both transferring from race (source) to gender (target), as well as from gender (source) to race (target).

Results: Table 2 summarizes the results. Applying the fairness and transfer heads to the large amount of source data closes the FPR gap in the target domain. Increasing the amount of data in the target domain significantly helps the performance of the “Target Only” and the “Source+Target” models. This is intuitive since directly debiasing in the target domain is feasible with sufficient data. With sufficient data, the results converge to be approximately equivalent to the transfer model.

These experiments show that the transfer model is effective in decreasing the FPR gap in the target domain and is more sample efficient than previous methods.

Smallest FPR difference achieved on Target (FPR-diff std. dev)
Source to Target #Target Samples Source only Target only Source + Target With Transfer Head
Dataset 1 Gender to Race 50
Race to Gender 50
Dataset 2 Gender to Race 50
Race to Gender 50
Table 2: Comparison between the proposed model and the baselines. The numbers in bold indicate the smallest FPR difference achieved in the target domain w.r.t. varying number of target samples.

7 Conclusion

In this paper we provide the first theoretical examination of transfer of machine learning fairness across domains. We adopt a general formulation of domain adaptation for fairness that covers a wide variety of fairness challenges, from proxies of sensitive attributes, to applying models in unanticipated settings. Within this general formulation, we have provided theoretical bounds on the transfer of fairness for equal opportunity and equalized odds using both VC-dimension and Rademacher Complexity. Based on this theory, we developed a new modeling approach to transfer fairness to a given target domain. In experiments we validate our theoretical results and demonstrate that our modeling approach is more sample efficient in improving fairness metrics in a target domain.


  • Agarwal et al. [2018] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M. Wallach. A reductions approach to fair classification. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 60–69, 2018.
  • Ben-David et al. [2007] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007.
  • Ben-David et al. [2010] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(1-2):151–175, 2010.
  • Beutel et al. [2017] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. Proceedings of the Conference on Fairness, Accountability and Transparency, 2017.
  • Beutel et al. [2019] A. Beutel, J. Chen, T. Doshi, H. Qian, A. Woodruff, C. Luu, P. Kreitmann, J. Bischof, and E. H. Chi. Putting fairness principles into practice: Challenges, metrics, and improvements. Artificial Intelligence, Ethics, and Society, 2019.
  • Bousmalis et al. [2016] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 343–351, 2016.
  • Chen et al. [2018] I. Chen, F. D. Johansson, and D. Sontag. Why is my classifier discriminatory? arXiv preprint arXiv:1805.12002, 2018.
  • Crammer et al. [2008] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. Journal of Machine Learning Research, 9(Aug):1757–1774, 2008.
  • Dixon et al. [2018] L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. Measuring and mitigating unintended bias in text classification. In available at: www. aies-conference. com/wp-content/papers/main/AIES_2018_paper_9. pdf (accessed 6 August 2018).[Google Scholar], 2018.
  • Edwards and Storkey [2016] H. Edwards and A. J. Storkey. Censoring representations with an adversary. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
  • Ganin et al. [2016] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
  • Goh et al. [2016] G. Goh, A. Cotter, M. Gupta, and M. P. Friedlander. Satisfying real-world goals with dataset constraints. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2415–2423. Curran Associates, Inc., 2016.
  • Gretton et al. [2012] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. In The Journal of Machine Learning Research, 2012.
  • Gupta et al. [2018] M. R. Gupta, A. Cotter, M. M. Fard, and S. Wang. Proxy fairness. CoRR, abs/1806.11212, 2018. URL
  • Hardt et al. [2016] M. Hardt, E. Price, N. Srebro, et al.

    Equality of opportunity in supervised learning.

    In Advances in neural information processing systems, pages 3315–3323, 2016.
  • Kallus and Zhou [2018] N. Kallus and A. Zhou. Residual unfairness in fair machine learning from prejudiced data. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 2444–2453, 2018.
  • Lan and Huan [2017] C. Lan and J. Huan. Discriminatory transfer. CoRR, 2017. URL
  • Li et al. [2018] Y. Li, T. Baldwin, and T. Cohn. Towards robust and privacy-preserving text representations. arXiv preprint arXiv:1805.06093, 2018.
  • Long et al. [2015] M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015.
  • Louizos et al. [2016] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. S. Zemel.

    The variational fair autoencoder.

    In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
  • Madras et al. [2018] D. Madras, E. Creager, T. Pitassi, and R. Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018.
  • Mansour et al. [2009] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. COLT, 2009.
  • Pan et al. [2010] S. J. Pan, Q. Yang, et al. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
  • Sculley et al. [2015] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503–2511, 2015.
  • Tsipras et al. [2018] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. There is no free lunch in adversarial robustness (but there are unexpected benefits). arXiv preprint arXiv:1805.12152, 2018.
  • Weiss et al. [2016] K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 2016.
  • Zafar et al. [2017] M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, pages 962–970, 2017.
  • Zhang et al. [2018] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. CoRR, abs/1801.07593, 2018. URL

Appendix A Rademacher Complexity

We provide additional bounds dependent on Radmacher Complexity based on the following definition of data-driven empirical Rademacher Complexity

Definition 1.

Given a hypothesis space , a sample , the empirical Rademacher Complexity of is defined as

The expectation is taken over where

are uniform independent random variables. The Rademacher Complexity of a hypothesis space is defined as the expectation of

over all sample sets of size


Rademacher Complexity measures the ability of a hypothesis space to fit random noise. The empirical Rademacher Complexity function allows us to estimate the Rademacher Complexity using a finite sample of data. Rademacher Complexity bounds can lead to tighter bounds than those of VC-dimension, especially when analyzing neural network models.

When transitioning to Rademacher Complexity we need to change the binary labels from to . This means that the error of a hypothesis is defined as

Additionally, we need new definitions of the equal opportunity and equalized odds distances over the new binary group membership. The equal opportunity distance is defined as

while the equlized odds distance is defined as

Using these new definitions Theorem 3 provides a Rademacher Complexity bound of the equal opportunity distance in the target space. This closely resembles the VC-dimension bound in Theorem 1.

Theorem 3.

Let be a hypothesis space. If are samples of size , each drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by

where .

The proof also follows a similar logic to the sketch given for Theorem 1 with the additional step of using a modification of Corollary 7 given by Mansour et al. [2009].

Similarly, Theorem 4 provides a Rademacher Complexity bound of the equalized odds distance in the target space.

Theorem 4.

Let be a hypothesis space. If are samples of size , each drawn from , , , , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equalized odds in the target space is bounded by

where , and .

Given either the Rademacher Complexity bounds or the VC-dimension bounds, the implications stay the same. In order for a successful transfer of fairness the two (or four) subspace domains should be close across the source and target domains. Additionally, there should be a hypothesis in the hypothesis space that performs well over all of the relevant subspaces.

Appendix B Proofs

Lemma 1.

(From Ben-David et al. [2010]) For any hypotheses ,

Lemma 2.

(From Ben-David et al. [2007], Crammer et al. [2008]) For any labeling functions , , and , we have

b.1 VC-dimension bounds

Lemma 3.

(From Ben-David et al. [2010]) Let be a hypothesis space on with VC-dimension . If and are samples of size from and respectively and is the empirical -divergence between samples, then for any , with probability at least ,

Theorem 1.

Let be a hypothesis space of VC dimension . If are samples of size each, drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by

where .


Without loss of generality assume . Then we can rewrite as follows:

where the last line follows from the fact that equal opportunity only cares about the error on the false data-points.

We now have the tools to find an upper-bound on .