1 Introduction
Much of machine learning research, and especially machine learning fairness, focuses on optimizing a model for a single use case Agarwal et al. (2018); Beutel et al. (2017). However, the reality of machine learning applications is far more chaotic. It is common for models to be used on multiple tasks, frequently different in a myriad of ways from the dataset that they were trained on, often coming at significant cost Sculley et al. (2015). This is especially concerning for machine learning fairness – we want our models to obey strict fairness properties, but we may have far less data on how the models will actually be used. How do we understand our fairness metrics in these more complex environments?
In traditional machine learning, domain adaptation techniques are used when the distribution of training and validation data does not match the target distribution that the model will ultimately be tested against. Therefore, in this paper we ask: if the model is trained to be “fair” on one dataset, will it be “fair” over a different distribution of data? Instead of starting again with this new dataset, can we use the knowledge gained during the original debiasing to more effectively debias in the new space?
It turns out that this framing covers many important cases for machine learning fairness. We will use, as a running example, the task of income prediction, where some decisions will be made based on the person’s predicted income and we want the model to perform “fairly” over a sensitive attribute such as gender. We primarily follow the equality of opportunity Hardt et al. (2016) perspective where we are concerned with one group (broken down by gender or race) having worse accuracy than another. In this setting, there are a myriad of fairness issues that arise that we find domain adaptation can shed light on:
Lacking sensitive features for training: There may be few examples where we know the sensitive attribute. In these cases, a proxy of the sensitive attribute have been used Gupta et al. (2018), or researchers need very sampleefficient techniques Agarwal et al. (2018); Beutel et al. (2017). For distant proxies, researchers have asked how well fairness transfers across attributes Lan and Huan (2017). Here the sensitive attribute differs in the source and target domains.
Data is not representative of application: Dataset augmentation, models offered as an API, or models used in multiple unanticipated settings, are all increasingly common design patterns. Even for machine learning fairness, researchers often believe limited training data is a primary source of fairness issues Chen et al. (2018) and will employ dataset augmentation techniques to try to improve fairness Dixon et al. (2018). How can we best make use of auxiliary data during training and evaluation when it differs in distribution from the real application?
Multiple tasks: In some cases having accurate labels for model training is difficult and instead proxy tasks with more labeled data are used to train the model, e.g., using pretrained image or text models or using income brackets as a proxy for defaulting on a loan. Again we ask: when does satisfying a fairness property on the original task help satisfy that same property on the new task?
Each of these cases are common throughout machine learning but present challenges for fairness. In this work, we explore mapping domain adaptation principles to machine learning fairness. In particular, we offer the following contributions:

Theoretical Bounds:
We provide theoretical bounds on transferring equality of opportunity and equality of odds metrics across domains. Perhaps more importantly, we discuss insights gained from these bounds.

Modeling for Fairness Transfer: We offer a general, theoreticallybacked modeling objective that enables transferring fairness across domains.

Empirical validation: We demonstrate when transferring machine learning fairness works successfully, and when it does not, through both synthetic and realistic experiments.
2 Related Work
This work lies at the intersection of traditional domain adaptation and recent work on ML fairness.
Domain Adaptation
Both Pan et al. (2010), and Weiss et al. (2016)
provide a survey on current work in transfer learning. One case of transfer learning is domain adaptation, where the task remains the same, but the distribution of features that the model is trained on (the source domain) does not match the distribution that the model is tested against (the target domain).
BenDavid et al. (2007) provide theoretical analysis of domain adaptation. BenDavid et al. (2010) extend this analysis to provide a theoretical understanding of how much source and target data should be used to successfully transfer knowledge. Mansour et al. (2009) provide theoretical bounds on domain adaptation using Rademacher Complexity analysis. In later research, Ganin et al. (2016) build on this theory to use an adversarial training procedure over latent representations to improve domain adaptation.Fairness in Machine Learning
A large thread of recent research has studied how to optimize for fairness metrics during model training. Li et al. (2018) empirically show that adversarial learning helps preserve privacy over sensitive attributes. Beutel et al. (2017) focus on using adversarial learning to optimize different fairness metrics, and Madras et al. (2018) provides a theoretical framework for understanding how adversarial learning optimizes these fairness goals. Zhang et al. (2018)
use adversarial training over logits rather than hidden representations. Other work has focused on constraintbased optimization of fairness objectives
Goh et al. (2016); Agarwal et al. (2018). Tsipras et al. (2018) however, provide a theoretical bound on the accuracy of adversarial robust models. They show that even with infinite data there will still be a tradeoff of accuracy for robustness.Domain Adaptation & Fairness
Despite the prevalence of using one model across multiple domains, in practice little work has studied domain adaptation and transfer learning of fairness metrics. Kallus and Zhou (2018) use covariate shift correction when computing fairness metrics to address bias in label collection. More related, Madras et al. (2018) show empirically that their method allows for fair transfer. The transfer learning here corresponds to preserving fairness for a single sensitive attribute but over different tasks. However, Lan and Huan (2017) found empirically that fairness does not transfer well to a new domain. They found that as accuracy increased in the transfer process, fairness decreases in the new domain. It is concerning that these papers show opposing effects. Both of these papers offer empirical results on the UCI adult dataset, but neither provide a theoretical understanding of how and when fairness in one domain transfers to another.
3 Problem Formulation
We begin with some notation to make precise the problem formulation. Building on our running example we have two domains: a source domain , which is a feature distribution influenced by sensitive attribute (e.g., ), as well as a target domain influenced by sensitive attribute (e.g., ). In order for this to be a domain adaptation problem, we assume . Note, this can be true even if but the distributions conditioned on and differ. We focus on binary classification tasks with label
, e.g. income classification is shared over both domains. For this task we can create a classifier by finding a hypothesis
from a hypothesis space .Let us assume that we can learn a “fair” classifier for the source domain and task. If we use a small amount of data from the target domain, will the fairness from the source sensitive attribute transfer to the target domain and sensitive attribute ? We can define the notion of a “fairness” distance – how far away the classifier is from perfectly fair – in a given domain as . Within this formulation we consider two definitions of fairness.
The first distance is equality of opportunity Hardt et al. (2016). A classifier is said to be fair under equality of opportunity if the false positive rates (FPR) over sensitive attributes are equal. In other words if we have a binary sensitive attribute , then equality of opportunity requires that , where gives the outcome of classifier . Thus, how far away a classifier is from equal opportunity (or the fairness distance of equal opportunity) can be defined as
where . In our running example , where is gender, is the difference between the likelihood that a lowincome man is predicted to be highincome and the likelihood that a lowincome woman is predicted to be highincome. A symmetric definition and set of analysis can be made for false negative rate (FNR).
The second definition of fairness which we consider is equalized odds Hardt et al. (2016). A classifier is said to be fair under equalized odds if both the FPR and FNR over the sensitive attribute are equal: Similar to equal opportunity, we define the fairness distance of equalized odds as:
Again using our running example, the distance of equalized odds in the source domain is given by the difference of expected FPRs between females and males (as above), plus the difference of expected FNRs (highincome predicted to be lowincome) between females and males.
Given a classifier that has a fairness guarantee in the source domain, the fairness distance in the target domain should be bounded by the fairness distance in the source domain:
(1) 
The key question we hope to answer is: what is ?
4 Bounds on Fairness in the Target Domain
To expand inequality (1) we need to start with some definitions. Given a hypothesis space and a true labeling function , we can define the error of a hypothesis as , the expectation of disagreement between the hypothesis and the true label . We can then define the ideal joint hypothesis that minimizes the combined error over both the source and target domains as .
Following BenDavid et al. (2010) we define the
divergence between probability distributions as
(2) 
where is the set for which
is the characteristic function (
). We can compute an approximation by finding a hypothesis that finds the largest difference between the samples from and BenDavid et al. (2007). This divergence can be used to look at the differences in distributions, which is important when moving from a source domain to a target domain.Additionally, we defined the symmetric difference hypothesis space as the set of hypotheses
(3) 
where is the XOR function. The symmetric difference hypothesis space is used to find disagreements between a potential classifier and a true labeling function .
Theorem 1.
Let be a hypothesis space of VC dimension . If are samples of size , each drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by
where .
Using both the definition of divergence and symmetric difference hypothesis space, Theorem 1 provides a VCdimension bound on the equal opportunity distance in the target domain given the equal opportunity distance in the source domain. Due to space limitations, full proofs for all theorems can be found in Appendix B.
This theorem provides insights on when domain adaptation for fairness can be used. Firstly the terms in the bound suggest that 1) the source and target distributions of negatively labeled items that have a sensitive attribute label of 0 should be close, and 2) the source and target distributions of the negatively labeled items that have a sensitive attribute label of 1 should be close. In Figure 1 the red quadrants should be close to the red quadrants while the orange quadrants should be close to the orange quadrants across domains. In traditional domain adaptation, ignoring fairness, the entire domains should be close (the entire circle), which means that if there are few minority datapoints then the distance of the minority spaces will be ignored. The fairness bound instead puts equal emphasis on both the majority and minority.
Secondly, the terms become small when the hypothesis space contains a function that has low error on both the source and target space on the two negative segments in each domain (the red and orange spaces in Figure 1). Since we are looking at equal opportunity, the function only needs to have low error on the negative space for both the majority and minority. Therefore, we can use the trivial function and the terms go to 0.
Lastly, Theorem 1 depends on the VCdimension
. Since bounds with VCdimensions explode with models like neural networks, we also provide bounds using Rademacher Complexity in Appendix
A.Equalized odds, while similar to equal opportunity, is a stricter fairness constraint. Theorem 2 provides a VCdimension bound on the difference of equal odds in the target domain given the source domain.
Theorem 2.
Let be a hypothesis space of VC dimension . If are samples of size , each drawn from for all and , then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equalized odds in the target space is bounded by
where , and .
The terms suggest, that in order for equalized odds to transfer successfully then, 1) the source and target distributions of negatively labeled items on both sensitive attribute labels 0 and 1 should be close, 2) the source and target distributions of the positively labeled items on both sensitive attribute labels 0 and 1 should be close. In other words, all four quadrants of the source should individually be close to the respective four quadrants of the target in Figure 1.
Additionally, the term shows that there should be a hypothesis that performs well over all of these subspaces. This implication is intuitive given that equalized odds, by definition, wants a classifier to perform well in both the negative and positive space across both groups.
5 Modeling to Transfer Fairness
With this theoretical understanding, how should we change our training? As motivated previously, we consider the case where we have a small amount of labelled data (both labels and sensitive attributes ) in the target domain and a large amount of labelled data in the source domain.
As shown in the previous section, equality of opportunity will transfer if the distance between the respective distributions of source and target are close together as visually portrayed in Figure 1. Ganin et al. (2016) proved that traditional domain adaptation can be framed as minimizing the distance between source and target with adversarial training. Louizos et al. (2016); Edwards and Storkey (2016); Beutel et al. (2017); Li et al. (2018) similarly have applied adversarial training to achieve fairness goals, and Madras et al. (2018) proved that equality of odds can be optimized with adversarial training similar to domain adaptation.
We build on this intuition to design a learning objective for transferring equality of opportunity to a target domain. Adversarial training conceptually enables minimizing a term from Theorem 1; and can be optimized using Beutel et al. (2017); Madras et al. (2018) or one of the other myriad of traditional fairness learning objectives. As such, we begin with the following loss:
(4) 
where
is the loss function training
over hidden representation to predict the task label . To optimize , tries to predict the sensitive attribute from the source and provides an adversarial loss that includes a negated gradient on following Beutel et al. (2017). For transfer, we minimize terms by including another adversarial loss , where tries to predict whether a sample comes from the source or target domain. Each of these loss components maps to terms in Theorem 1 as laid out in Table 1.Loss Term  Theorem 1  Adversarial (Eq. 4)  Regularization (Eq. 5) 

Fairness head  
Transfer head  
Recently, Zhang et al. (2018) used adversarial training on a one dimensional representation of the data (effectively the model’s prediction). From this perspective, we can use a wide variety of losses over predictions to replace adversarial losses, such as Zafar et al. (2017); Beutel et al. (2019) minimizing the correlation between group and the one dimensional representation of the data. Like previous work, we find that these approaches to be more stable and still effective in comparison to adversarial training, despite not being provably optimal. In our experiments we use a MMD loss Gretton et al. (2012); Long et al. (2015); Bousmalis et al. (2016) over predictions:
(5) 
where is the MMD regularization over the sensitive attributes in the source domain, is the MMD regularization over source/target membership. Again Table 1 maps the terms in Eq. 5 to those in Theorem 1.
Care must be taken when performing domain adaptation with regards to fairness. Either multiple transfer heads should be included in the loss for all necessary quadrants (See Figure 1 and Eq. 4), or balanced data – equally representing all necessary quadrants – should be used as in Madras et al. (2018) and Eq. 5. Experiments in this paper use the MMD regularization as in Eq. 5 and balanced data is used for both the fairness head as well as the transfer heads.
6 Experiments
To better understand the theoretical results presented above, we now present both synthetic and realistic experiments exploring tightness of our theoretical bound as well as the ability to improve the transfer of fairness across domains during model training.
6.1 Synthetic Examples
We show how well the theoretical bounds align with actual transfer of fairness. A synthetic dataset is used to examine how the distribution distance terms and in Eq. (1) affect the fairness distance of equal opportunity .
In this synthetic example, we generate data
using Gaussian distributions. As we can see in Figure
2(a), the source domain consists of four Gaussians, with largely lying above and lying to the left of ; is the majority of the data ( with samples). For , the data is generated using with samples. The target domain, like the source domain, consists of majority data with and the data from is generated from the same distribution in both domains: and . However, in order to understand the transfer of fairness, we shift the distributions of and in the target domain ( for 2(b), 2(c) and 2(d), respectively). By varying the overlap between these distributions, and their alignment with the source data, we are able to understand the relationship between the terms above and the fairness distance of equal opportunity . For each setting, we train linear classifiers on the source domain and examine the performance in the target domain.Qualitative Analysis
We see in Fig. 2(b) that when the distribution across domains is close, thus a smaller , there is better transfer of fairness the source to the target domain, seen in the smaller . As the distribution distance gets larger, the also increases. Consider the worst case of a sign flip for the minority , as shown in Fig. 2(d): the FPR for the majority is close to , while the FPR for the minority is close to .
Quantitative Analysis
In Figure 2(e), we compare the derived bound of (Eq. 1
) with its empirical estimate as we vary
^{1}^{1}1As in BenDavid et al. (2007), is estimated by a linear classifier trained on samples . The plot omits the VC term for simplicity, which is relatively small when sample size is large and VCdimension is low.. As shown in Figure 2(e), the theoretical bound on the equal opportunity distance is close to the observed equal opportunity distance when the distance between the negative minority space across domains, , is small. This suggests, minimizing the domain distance terms in Eq. 1 could lead to a better equal opportunity transfer.6.2 Real Data
Effect of fairness/transfer head on the UCI data. The shaded areas show the standard error of the mean across trials. Note the head weight (xaxis) starts from
.We now explore how and when our proposed modeling approach in Section 5 facilitates the transfer of fairness from the source to the target domain on two realworld datasets. Note, we use these datasets exclusively for understanding our theory and model, and not as a comment on when or if the proposed tasks and their application are appropriate, as in Agarwal et al. (2018).
Dataset 1: The UCI Adult^{2}^{2}2https://archive.ics.uci.edu/ml/datasets/adult dataset contains census information of over 40,000 adults from the 1994 Census, with the task of determining income brackets of or . We focus on two sensitive attributes: binary valued gender, and race, converted to binary values [‘white’, ‘nonwhite’] as done by Madras et al. (2018).
Dataset 2: As in Agarwal et al. (2018) we use ProPublica’s COMPAS recidivism data^{3}^{3}3https://github.com/propublica/compasanalysis
to try to predict recidivism for over 10,000 defendants based on age, gender, demographics, prior crime count, etc. We again focus on two sensitive attributes: gender and race (binarized to [‘white’, ‘nonwhite’]).
Experiment Setup
For both datasets, crossvalidation is used to choose the hyperparameters. Comparable baseline accuracy (around for Dataset 1 and for Dataset 2, see appendix D for more details) is achieved with embedding dimension for categorical features, single hidden layer with shared hidden units, batch size, learning rate with Adagrad optimizer, and epochs for training. We perform runs for each set of experiments and average over the results.
Sparsity Issues and Natural Transfer
We examine the effectiveness of just the fairness heads in the proposed model. The amount of genderbalanced data created for the fairness head is varied to observe how applying the fairness head affects the FPR difference.
We examine how this procedure effects the FPR difference across genders (i.e., the FPR difference between “Female” and “Male” examples). Figure 3(a) shows that the fairness head works as expected: with sufficient data and a large enough weight, the fairness head is able to improve the FPR gap across genders. Further, we find that with very few examples on which to apply the fairness head, the gender FPR gap does not close. This aligns with previous results found in Beutel et al. (2017); Madras et al. (2018); Beutel et al. (2019).
Second, we examine how running the fairness head on gender affects the FPR gap across race. As shown in Figure 3(b), there is a natural transfer of equal opportunity from gender to race – applying a fairness loss with respect to gender also improves the fairness of the model with respect to race. This highlights that sometimes there is a natural transfer of equal opportunity, presenting general value in improving the FPR gap with respect to gender, and no explicit transfer optimization is needed. (Similar to the transfer questions posed previously by Madras et al. (2018) and Gupta et al. (2018)).
Effectiveness of Transfer Head
We now explore how adding the transfer head can further improve equality of opportunity in the target domain. We compare four different model arrangements: (1) Source Only: We only add a fairness head for the source domain; (2) Target Only: We only add a fairness head for the target domain; (3) Source+Target: We add two fairness heads, one for source and for target; (4) Transfer: We include three heads – both source and target fairness heads as well as the transfer head for equality of opportunity.
Experiment setting: As in typical transfer learning setting, we will focus on the case where we observe a large number of samples in the source domain (e.g., 1000 for each race “white” and “nonwhite”), but a smaller sample size in the target domain (e.g., 100 for each gender “male” and “female”), and the same for gender to race. We explore equality of opportunity with respect to FPR in the target domain, as we vary the weight on the fairness and transfer heads.
Results: Figure 3(c) shows that including the transfer head results in a better equal opportunity transfer, compared to the same setting without transfer (Figure 3(b)). Table 2 summarizes the full results on both datasets. We can see that including both the fairness heads and the transfer head consistently gives the best improvement in equal opportunity (FPR difference) in almost all cases.
Effect of Target Sample Size
Last, we consider how the amount of data from the target domain affects our ability to improve equal opportunity there, as sample efficiency is a core challenge.
Experiment setting: We follow a similar experimental procedure as before with two modifications. First, we vary the number of samples we observe for each sensitive group in the target domain to be in . We examine the efficacy of the four approaches depending on the amount of data available for debiasing in the target domain. Second, this analysis is performed for both transferring from race (source) to gender (target), as well as from gender (source) to race (target).
Results: Table 2 summarizes the results. Applying the fairness and transfer heads to the large amount of source data closes the FPR gap in the target domain. Increasing the amount of data in the target domain significantly helps the performance of the “Target Only” and the “Source+Target” models. This is intuitive since directly debiasing in the target domain is feasible with sufficient data. With sufficient data, the results converge to be approximately equivalent to the transfer model.
These experiments show that the transfer model is effective in decreasing the FPR gap in the target domain and is more sample efficient than previous methods.
Smallest FPR difference achieved on Target (FPRdiff std. dev)  

Source to Target  #Target Samples  Source only  Target only  Source + Target  With Transfer Head  
Dataset 1  Gender to Race  50  
100  
500  
1000  
Race to Gender  50  
100  
500  
1000  
Dataset 2  Gender to Race  50  
100  
500  
1000  
Race to Gender  50  
100  
500  
1000 
7 Conclusion
In this paper we provide the first theoretical examination of transfer of machine learning fairness across domains. We adopt a general formulation of domain adaptation for fairness that covers a wide variety of fairness challenges, from proxies of sensitive attributes, to applying models in unanticipated settings. Within this general formulation, we have provided theoretical bounds on the transfer of fairness for equal opportunity and equalized odds using both VCdimension and Rademacher Complexity. Based on this theory, we developed a new modeling approach to transfer fairness to a given target domain. In experiments we validate our theoretical results and demonstrate that our modeling approach is more sample efficient in improving fairness metrics in a target domain.
References
 Agarwal et al. [2018] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M. Wallach. A reductions approach to fair classification. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018, pages 60–69, 2018.
 BenDavid et al. [2007] S. BenDavid, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007.
 BenDavid et al. [2010] S. BenDavid, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(12):151–175, 2010.
 Beutel et al. [2017] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. Proceedings of the Conference on Fairness, Accountability and Transparency, 2017.
 Beutel et al. [2019] A. Beutel, J. Chen, T. Doshi, H. Qian, A. Woodruff, C. Luu, P. Kreitmann, J. Bischof, and E. H. Chi. Putting fairness principles into practice: Challenges, metrics, and improvements. Artificial Intelligence, Ethics, and Society, 2019.
 Bousmalis et al. [2016] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 510, 2016, Barcelona, Spain, pages 343–351, 2016.
 Chen et al. [2018] I. Chen, F. D. Johansson, and D. Sontag. Why is my classifier discriminatory? arXiv preprint arXiv:1805.12002, 2018.
 Crammer et al. [2008] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. Journal of Machine Learning Research, 9(Aug):1757–1774, 2008.
 Dixon et al. [2018] L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. Measuring and mitigating unintended bias in text classification. In available at: www. aiesconference. com/wpcontent/papers/main/AIES_2018_paper_9. pdf (accessed 6 August 2018).[Google Scholar], 2018.
 Edwards and Storkey [2016] H. Edwards and A. J. Storkey. Censoring representations with an adversary. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24, 2016, Conference Track Proceedings, 2016.
 Ganin et al. [2016] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
 Goh et al. [2016] G. Goh, A. Cotter, M. Gupta, and M. P. Friedlander. Satisfying realworld goals with dataset constraints. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2415–2423. Curran Associates, Inc., 2016.
 Gretton et al. [2012] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel twosample test. In The Journal of Machine Learning Research, 2012.
 Gupta et al. [2018] M. R. Gupta, A. Cotter, M. M. Fard, and S. Wang. Proxy fairness. CoRR, abs/1806.11212, 2018. URL http://arxiv.org/abs/1806.11212.

Hardt et al. [2016]
M. Hardt, E. Price, N. Srebro, et al.
Equality of opportunity in supervised learning.
In Advances in neural information processing systems, pages 3315–3323, 2016.  Kallus and Zhou [2018] N. Kallus and A. Zhou. Residual unfairness in fair machine learning from prejudiced data. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018, pages 2444–2453, 2018.
 Lan and Huan [2017] C. Lan and J. Huan. Discriminatory transfer. CoRR, 2017. URL http://arxiv.org/abs/1707.00780.
 Li et al. [2018] Y. Li, T. Baldwin, and T. Cohn. Towards robust and privacypreserving text representations. arXiv preprint arXiv:1805.06093, 2018.
 Long et al. [2015] M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015.

Louizos et al. [2016]
C. Louizos, K. Swersky, Y. Li, M. Welling, and R. S. Zemel.
The variational fair autoencoder.
In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24, 2016, Conference Track Proceedings, 2016.  Madras et al. [2018] D. Madras, E. Creager, T. Pitassi, and R. Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018.
 Mansour et al. [2009] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. COLT, 2009.
 Pan et al. [2010] S. J. Pan, Q. Yang, et al. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
 Sculley et al. [2015] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503–2511, 2015.
 Tsipras et al. [2018] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. There is no free lunch in adversarial robustness (but there are unexpected benefits). arXiv preprint arXiv:1805.12152, 2018.
 Weiss et al. [2016] K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 2016.
 Zafar et al. [2017] M. B. Zafar, I. Valera, M. GomezRodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 2022 April 2017, Fort Lauderdale, FL, USA, pages 962–970, 2017.
 Zhang et al. [2018] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. CoRR, abs/1801.07593, 2018. URL http://arxiv.org/abs/1801.07593.
Appendix A Rademacher Complexity
We provide additional bounds dependent on Radmacher Complexity based on the following definition of datadriven empirical Rademacher Complexity
Definition 1.
Given a hypothesis space , a sample , the empirical Rademacher Complexity of is defined as
The expectation is taken over where
are uniform independent random variables. The Rademacher Complexity of a hypothesis space is defined as the expectation of
over all sample sets of size(6) 
Rademacher Complexity measures the ability of a hypothesis space to fit random noise. The empirical Rademacher Complexity function allows us to estimate the Rademacher Complexity using a finite sample of data. Rademacher Complexity bounds can lead to tighter bounds than those of VCdimension, especially when analyzing neural network models.
When transitioning to Rademacher Complexity we need to change the binary labels from to . This means that the error of a hypothesis is defined as
Additionally, we need new definitions of the equal opportunity and equalized odds distances over the new binary group membership. The equal opportunity distance is defined as
while the equlized odds distance is defined as
Using these new definitions Theorem 3 provides a Rademacher Complexity bound of the equal opportunity distance in the target space. This closely resembles the VCdimension bound in Theorem 1.
Theorem 3.
Let be a hypothesis space. If are samples of size , each drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by
where .
The proof also follows a similar logic to the sketch given for Theorem 1 with the additional step of using a modification of Corollary 7 given by Mansour et al. [2009].
Similarly, Theorem 4 provides a Rademacher Complexity bound of the equalized odds distance in the target space.
Theorem 4.
Let be a hypothesis space. If are samples of size , each drawn from , , , , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equalized odds in the target space is bounded by
where , and .
Given either the Rademacher Complexity bounds or the VCdimension bounds, the implications stay the same. In order for a successful transfer of fairness the two (or four) subspace domains should be close across the source and target domains. Additionally, there should be a hypothesis in the hypothesis space that performs well over all of the relevant subspaces.
Appendix B Proofs
Lemma 1.
(From BenDavid et al. [2010]) For any hypotheses ,
Lemma 2.
b.1 VCdimension bounds
Lemma 3.
(From BenDavid et al. [2010]) Let be a hypothesis space on with VCdimension . If and are samples of size from and respectively and is the empirical divergence between samples, then for any , with probability at least ,
Theorem 1.
Let be a hypothesis space of VC dimension . If are samples of size each, drawn from , , , and respectively, then for any , with probability at least (over the choice of samples), for every (where is a symmetric hypothesis space) the distance from equal opportunity in the target space is bounded by
where .
Proof.
Without loss of generality assume . Then we can rewrite as follows:
where the last line follows from the fact that equal opportunity only cares about the error on the false datapoints.
We now have the tools to find an upperbound on .
(7)  
Comments
There are no comments yet.