Domain Adaptation meets Individual Fairness. And they get along

05/01/2022
by   Debarghya Mukherjee, et al.
0

Many instances of algorithmic bias are caused by distributional shifts. For example, machine learning (ML) models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we leverage this connection between algorithmic fairness and distribution shifts to show that algorithmic fairness interventions can help ML models overcome distribution shifts, and that domain adaptation methods (for overcoming distribution shifts) can mitigate algorithmic biases. In particular, we show that (i) enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models, and that (ii) it is possible to adapt representation alignment methods for domain adaptation to enforce (individual) fairness. The former is unexpected because IF interventions were not developed with distribution shifts in mind. The latter is also unexpected because representation alignment is not a common approach in the IF literature.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

11/06/2020

There is no trade-off: enforcing fairness can improve accuracy

One of the main barriers to the broader adoption of algorithmic fairness...
06/26/2022

Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

The increasing reliance on ML models in high-stakes tasks has raised a m...
06/20/2022

Algorithmic Fairness and Vertical Equity: Income Fairness with IRS Tax Audit Models

This study examines issues of algorithmic fairness in the context of sys...
06/23/2022

Context matters for fairness – a case study on the effect of spatial distribution shifts

With the ever growing involvement of data-driven AI-based decision makin...
04/21/2022

A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms

Motivated by the growing importance of reducing unfairness in ML predict...
06/06/2018

Causal Interventions for Fairness

Most approaches in algorithmic fairness constrain machine learning metho...
02/26/2020

Understanding Self-Training for Gradual Domain Adaptation

Machine learning systems must adapt to data distributions that evolve ov...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Algorithmic bias and distribution shifts are often considered as separate problems. However, a recent body of empirical work has shown that many instances of algorithmic bias are caused by distribution shifts. Broadly speaking, there are two ways distribution shifts cause algorithmic biases [obermeyer2021algorithmic]:

  1. the model is trained to predict the wrong target;

  2. the model is trained to predict the correct target, but its predictions are inaccurate for demographic groups that are underrepresented in the training data.

From a statistical perspective, the first type of algorithmic bias is caused by posterior drift between the training data and the real-world. This leads to a mismatch between the model’s predictions and the real world. This type of algorithmic bias is also known as label choice bias [obermeyer2019dissecting]. The second type of algorithmic biases arises when ML models are trained or evaluated with non-diverse data, causing the models to perform poorly on underserved groups. This type of algorithmic bias is caused by a covariate shift between the training data and the real-world data.

This overlap between the problems of algorithmic bias and distribution shift suggests two questions:

  1. Is it possible to overcome distribution shifts with algorithmic fairness interventions?

  2. Is it possible to mitigate algorithmic biases caused by distribution shifts with domain adaptation methods?

For a concrete example, consider building an ML model to predict a person’s occupation from their biography. For this task, Yurochkin et al. [yurochkin2021sensei]

showed that ML models trained on top of pre-trained language models without any algorithmic fairness intervention can be unfair: they can change prediction (e.g., from attorney to paralegal or vice versa), when the name and gender pronouns are changed in the input biography. This is a violation of individual fairness, in part caused by underrepresentation of female attorneys in the train (source) data. Consequently, this model underperforms on female attorneys. Suppose that, in the target dataset, female attorneys were better represented. This is a type of distribution shift known as subpopulation shift in the domain adaptation literature 

[koh2021wilds]. In this case, enforcing individual fairness will not only result in a fairer model, but can also improve performance in the target domain, i.e., solve the domain adaptation problem.

Now, under the same source and target domains, consider applying a domain adaptation method that matches the distributions of representations on the domains. Assuming class marginals are the sameThis setting corresponds to domain shift assumption common in the DA literature., i.e., source and target have the same fraction of attorneys, any differences between the source and the target distribution are due to different fractions of male to female attorneys. Learning a feature (representation) extractor that is invariant to gender pronouns and names will align the two domains and result in a model that is individually fair. The goal of this paper is to precisely characterize, in which cases enforcing IF can achieve domain generalization and vice a versa. Our contributions can be summarized as:

  • We show that methods designed for IF can help ML models adapt/generalize to new domains, i.e., improve the accuracy of the trained ML model on out-of-distribution samples.

  • Conversely, we show that domain adaptation algorithms that align the feature distributions in the source and target domains can be used to improve individual fairness under certain probabilistic conditions on the features.

We verify our theory on the Bios [de2019bias] and the Toxicity [dixon2018measuring] datasets. Specifically, we demonstrate that enforcing IF via the methods of Yurochkin et al. [yurochkin2021sensei] and Petersen et al. [petersen2021post] improves accuracy on the target domain, and DA methods [ganin2016domain, shu2018dirtt, shen2018wasserstein] trained with appropriate source and target domains improve IF.

2 Overcoming Distribution Shift by Enforcing Individual Fairness

The goal of individual fairness is to ensure similar treatment of similar individuals. Dwork et al. [dwork2012fairness] formalize this notion using -Lipschitz continuity of an ML model :

(1)

for all . Here, is the metric on the output space quantifying the similarity of treatment of individuals, and is the metric on the input space quantifying the similarity of individuals.

Algorithms for enforcing IF are similar to algorithms for domain adaptation/generalization. For example, adversarial training/distributionally robust optimization can not only enforce IF [yurochkin2020training, yurochkin2021sensei], but can also be used for training ML models that are robust to distribution shifts [shu2018dirtt, sagawa2019distributionally]. This similarity is more than a mere coincidence: the goal in both enforcing IF and domain adaptation/generalization is ignoring uninformative dissimilarity. In IF, we wish to ignore variation among inputs that are attributed to variation of the sensitive attribute. In domain adaptation/generalization, we wish to ignore variation among inputs that are attributed to the idiosyncracies of the domains. Mathematically, ignoring uninformative dissimilarity is enforcing invariance/smoothness of the ML model among inputs that are dissimilar in uninformative ways. For example, (1) requires the model to be approximately constant on small -balls.

In this section, we exploit this connection between IF and domain adaptation/generalization to show that enforcing IF can improve accuracy in target domain if the regression function is individually fair. In order words, if the inductive bias from enforcing IF is correct, then enforcing IF improves accuracy in the target domain. To warm up, we consider the transductive (learning) setting before moving on to the inductive setting.

2.1 Warm Up: The Transductive Setting

To keep things simple, we consider adapting an ML model from a source domain to a target domain. We have labeled samples from the source domain and unlabeled samples from the target domain . We assume the regression function in the source and target domains are identical.

(2)

where

’s are exogenous error terms with mean zero and variance

. This is a special case of distribution shift called covariate shift [shimodaira2000improving]. Our goal is to obtain a model that has comparable accuracy on the source and target domains. We leverage the (labeled) source and (unlabeled) target samples and inductive bias on the smoothness of the regression function. We encode this inductive bias in a regularizer  and solve a regularized risk minimization problem

(3)

where is the model class,

is a loss function, and

is a regularization parameter. In the transductive setting, the regularizer

is a function of the vector of model outputs on the source and target inputs:

, where (resp. ) is the vector of outputs on the source (resp. target) inputs. Intuitively, the regularizer enforces invariance/smoothness of the model outputs on the source and target inputs.

A concrete example of a such a regularizer is the graph Laplacian regularizer. A graph Laplacian regularizer is based on a similarity symmetric kernel on the input space . For example, Petersen et al. [petersen2021post] take kernel to be a decreasing function of a fair metric that is learned from data [mukherjee2020Two], e.g., a metric in which the distance between male and female biographies with similar relevant content is small. In domain adaptation, a similar intuition can be applied. For example, suppose the source train data consists of Poodle dogs and Persian cats (the task is to distinguish cats and dogs), and the target data consists of Dalmatians and Siamese cats [santurkar2020breeds]. Then, a meaningful metric for constructing the kernel should assign small distances to different breeds of the same animal species.

Given the kernel, we construct the similarity matrix . Note that here we are considering all the source and target covariates together. Based on the similarity matrix, the (unnormalized) Laplacian matrix is defined as:

(4)

where is a diagonal matrix with , which is often denoted as the degree of the observation. There are also other ways of defining (e.g., or ) which would also lead to the similar conclusion, but we stick to (4) for the ease of exposition.

Based on the Laplacian matrix , we define the graph Laplacian regularizer as:

The above regularizer enforces that if is large for a pair (i.e., they are similar), must be close to . As mentioned earlier, for individual fairness, is chosen to be a monotonically decreasing function of , which ensures that and are close to each other when is close to with respect to the fair metric (for more details, see Petersen et al. [petersen2021post]). Recently, Lahoti et al. [lahoti2019ifair], Kang et al. [kang2020inform], and Petersen et al. [petersen2021post]

used the graph Laplacian regularizer to post-process ML models so that they are individually fair. This is also widely used in semi-supervised learning to leverage unlabeled samples 

[chapelle2006Semisupervised].

We focus on problems in which the model class is mis-specified, i.e., . If the model is well-specified (i.e., ), the optimal prediction rule in the training and target domains are identical (both are

). So, it is possible to learn the optimal prediction rule for the target domain from the training domain (e.g., by empirical risk minimization (ERM)), and there is no need to adapt models trained in the source domain to the target. On the other hand, if the model is mis-specified, the transfer learning task is non-trivial because the optimal prediction rule model depends on the distribution of the inputs (which differ in training and target domains). We focus on the non-trivial case in this section.

Now, we show that, as long as satisfies the smoothness structure enforced by the regularizer, from (3) remains accurate at the target inputs . First, we state our assumptions on the loss function and the regularizer .

We assume that the regression function is smooth with respect to the penalty , i.e., for some small .

This is an assumption on the effect of the smoothness structure enforced by the regularizer being in agreement with the regression function .

We assume that is -strongly convex with respect to the model outputs on the target inputs and -strongly smooth. More specifically, for

This is a regularity assumption on the regularizer that ensures the extrapolation map

(5)

is well-behaved. Intuitively, the extrapolation map extrapolates (hence its name) model outputs on the source domain to the target domain in the smoothest possible way.

We now show that the graph Laplacian regularizer satisfies Assumption 2.1. As is a quadratic function of , it is immediate that . Therefore, the strong convexity and smoothness of

depend on the behavior of the maximum and minimum eigenvalues of

. The maximum eigenvalue of is bounded above for the fixed design, which plays the role of in Assumption 2.1. For the lower bound, we note that we only assume strong convexity with respect to the target samples fixing the source samples. If we divide the whole Laplacian matrix into four blocks, then the value of the regularizer in terms of these four blocks will be:

Therefore, the Hessian of with respect to the model outputs in the target domain is whose minimum eigenvalue is bounded away from 0 as long as the graph is connected, i.e., source inputs have a degree of similarity with target inputs. Therefore, satisfies Assumption 2.1 and consequently can be used for domain adaptation. Next, we state our assumptions on the loss function:

The loss function satisfies and if and only if . Furthermore, it is - strongly convex and - strongly smooth, i.e.,

Assumption 2.1 is standard in learning theory, which provides us control over the curvature of the loss function.

Suppose

is the estimated function obtained from (

3). Under Assumption 2.1 on the loss function and Assumptions 2.1 and 2.1 on the regularizer, we have the following bound on the risk in the target domain:

(6)

for some constants (which depend on through the ratio ) explicitly defined in the proof (see (19) and (20) in the appendix).

We note that the right side of (6) does not depend on the ’s in the target domain. Intuitively, Theorem 2.1 guarantees the accuracy of on the inputs from the target domain as long as the following conditions hold.

  1. The model class is rich enough to include an that is not only accurate on the training domain, but also satisfies the smoothness/invariance conditions enforced by the regularizer. This implies the first term on the right side of (6) is small.

  2. The exact relation between inputs and outputs encoded in satisfies the smoothness structure enforced by the regularizer. This implies the second term on the right side of (6) is small.

If the model is correctly specified () and the regression function perfectly satisfies the smoothness conditions enforced by the regularizer (), then the bias term vanishes. In other words, Theorem 2.1 is adaptive to correctly specified model classes.

Proof Sketch of Theorem 2.1.

To keep things simple, we focus on the case in which the loss function is quadratic (). In this case, the risk of the trained model in the target domain is

and we have

(7)

The first term depends on the smoothness of the model outputs across the source and target domain : it measures the discrepancy between the model outputs in the target domain and the smoothest extrapolation of the model outputs in the source domain to the target domain . Similarly, the third term depends on the smoothness of the regression function (across the source and target domains). In Appendix B.1, we show that it is possible to bound the two terms with and , respectively.

It remains to bound the second term in (7). For this, we rely on the stability of the extrapolation map (5). Intuitively, the extrapolation operation is similar to a projection onto smooth functions, so the second term satisfies

We defer the details to Appendix B.1. ∎

2.2 The Inductive Setting

We now consider the inductive setting. Previously, in Section 2.1, we focused on the accuracy of the fitted model on the inputs from the test domain . In the inductive setting, we instead consider the expected loss of at a new (previously unseen) input point in the target domain. We consider a problem setup similar to that in Section 2.1: the labeled samples from the source domain are independently drawn from the source distribution , while the unlabeled samples from the target domain are independently drawn from (the marginal of) the target distribution . We also assume the covariate shift condition (2). The method remains the same as before: we learn from (3).

The main difference between the inductive and transductive settings is in the population version of the regularizer: In the transductive setting, we are only concerned with the output of the ML model for the inputs in the source and target domains; thereby, the population version of the regularizer remains a function of (the vector of) model outputs on the inputs in the source and target domains. In the inductive setting, we are also concerned with the output of the ML model on previously unseen points; thus, we consider the regularizer as a functional (i.e., a higher order function): (the two arguments corresponds to and in the transductive case). For example, the population version of the graph Laplacian regularizer (in the inductive setting) is

where and . The population version of (3) in the inductive setting is

(8)

Now, we establish the analogue of Theorem 2.1 in the inductive setting. First, we state our assumptions on the problem.

The function satisfies for some small . The (population) regularizer satisfies the following strong convexity condition:

and the following Lipschitz condition on the partial derivative of with respect to the second coordinate, i.e., for any two :

for some constants . Here, indicates the Gateaux derivative of with respect to the second coordinate along the direction .

Assumptions 2.2 and 2.2 are analogues of Assumptions 2.1 and 2.1 in the inductive setting. In fact, it is possible to show that Assumptions 2.2 and 2.2 imply Assumptions 2.1 and 2.1

with high probability by appealing to (uniform) laws of large numbers (see Appendix 

D). The following theorem provides a bound on the population estimation error of on the target domain:

Under Assumptions 2.12.2, and 2.2, we have:

for some constants defined in the proof.

The bound obtained in Theorem 2.2 is comparable to (6): the right side does not depend on the distribution . The second term denotes the aptness of regularizer , i.e., how well it captures the smoothness of over the domains. Similar to (6), we note that the bound in Theorem 2.2 is adaptive to correctly specified model classes.

To wrap up, we compare our theoretical results to other theoretical results on domain adaptation. There is a long line of work started by Ben-David et al. [ben-david2010theory] on out-of-distribution accuracy of ML models [mansour2009domain, ganin2016domain, saito2018maximum, zhang2019bridging, zhang2020localized]. Such bounds are usually of the form

(9)

for any , where is a measure of discrepancy between the source and target domains. For example, Zhang et al. [zhang2019bridging] show (9) with

A key feature of these bounds is that it is possible to evaluate the right side of the bounds with unlabeled samples from the target domain (and labeled samples from the source domain). Compared to our bounds, there are two main differences:

  1. Equation 9 applies to any (while our bound only applies to a specific from (8)). Although this uniform applicability is practically desirable (because it allows practitioners to evaluate the bound a posteriori to estimate the out-of-distribution accuracy of the trained model), it is theoretically undesirable because it precludes the bounds from adapting to correct specification of the model class.

  2. The uniform applicability of the bound (to any ) also precludes (9) from capturing the effects of the regularizer.

2.3 Extension to Domain Generalization

In this subsection, we further extend our results to the domain generalization setup, i.e., when we have no observations from the target domain. In the previous domain adaptation setup, when we had access to unlabeled data from the target domain, we used a suitable regularizer to extrapolate the prediction performance from the source domain to the target domain. However, when we do not have unlabeled data from the target domain, we need to alter the regularizer appropriately, so that we have some uniform guarantee over all domains in the vicinity of the source domain. Here is an example of a regularizer that seeks to improve domain generalization:

(10)

This regularizer is similar to the SenSeI regularizer originally proposed and studied by Yurochkin et al. [yurochkin2021sensei] for enforcing individual fairness. In fact, is exactly the (Mongé form) of the SenSeI regularizer. Note that we can further generalize this regularizer by incorporating a general loss function in the first equation or a general metric in the second equation. However, as this does not add anything to the underlying intuition, we confine ourselves to the metric here. Next, we present our theoretical findings with respect to this regularizer. To this end, we define the set of transformations and the corresponding set of measures . We show that it is possible to generalize the performance of the estimator obtained in (3) uniformly over the measures in . As mentioned previously, we only work with the quadratic loss function, but our result can be extended to the general loss function. The following theorem establishes a uniform bound on the estimation error of the population function obtained from (3) with the regularizer as defined in (10): The population estimator satisfies the following bound on the estimation error:

The bound obtained in the above is the same as the one obtained in Theorem 2.2 (up to constants) and has analogous interpretation: it consists of the minimum training error achieved on and the smoothness of quantified in terms of the regularizer. Moreover, the bound holds uniformly over all the domains , i.e., the performance of the estimator can be extrapolated to all the domains in , provided that is small.

2.4 Empirical Results

We verify our theoretical findings empirically. Our goal is to improve performance under distribution shifts using individual fairness methods. We consider SenSeI [yurochkin2021sensei], Sensitive Subspace Robustness (SenSR) [yurochkin2020training]

, Counterfactual Logit Pairing (CLP) 

[garg2018counterfactual], and GLIF [petersen2021post]. GLIF, similar to domain adaptation methods, requires unlabeled samples from the target. The other methods only utilize the source data as in the domain generalization scenario. Our theory establishes guarantees on the target domain performance for SenSeI (Section 2.3) and GLIF (Section 2.1).

Bios Toxicity
BA Worst p. gender BA TNR (Annot.) TNR (Id. tokens)
Baseline
GLIF
SenSeI
SenSR
CLP
Table 1:

Enforcing domain generalization using individual fairness methods. Means and standard deviations over 10 trials.

Datasets and Metrics

We experiment with two textual datasets, Toxicity [dixon2018measuring] and Bios [de2019bias]. In Toxicity, the goal is to identify toxic comments. This dataset has been considered by both the domain generalization community [koh2021wilds, zhai2021doro, creager2021environment] (under the name Civil Comments) as well as the individual fairness community [garg2018counterfactual, yurochkin2021sensei, petersen2021post]. The key difference between the two communities are in the comparison metrics. In domain generalization, it is common to consider performance on underrepresented groups (or simply worst group performance). In individual fairness, a common metric of choice is prediction consistency, i.e., a fraction of test samples where predictions remain unchanged under certain modifications to the inputs, which maintain a similarity from the fairness standpoint.

In Toxicity, the group memberships can be defined either with respect to human annotations provided with the dataset, or with respect to the presence of certain identity tokens. Both groupings aim at highlighting comments that refer to identities that are subject to online harassment. To quantify domain generalization, we evaluate average per group true negative (non-toxic) rate, where each group is weighted equally. We choose true negative rate (TNR) because underrepresented groups tend to have a larger fraction of toxic comments in the train data, thus being spuriously associated with toxicity by the model yielding poor TNR. This is similar to how the background is spurious in the popular domain generalization Waterbirds benchmark [sagawa2019distributionally]. We weigh each group equally to make sure that performance on underrepresented groups is factored in (a more robust alternative to the worst group performance). We consider both groupings, i.e., TNR (Annotations) and TNR (Identity tokens).

In Bios, the task is to predict the occupation of a person from their biography. This dataset has been mostly studied in the fairness literature [de2019bias, romanov2019what, prost2019debiasing, yurochkin2021sensei], but it can also be considered from the domain generalization perspective. Many of the occupations in the dataset exhibit large gender imbalance associated with historical biases, e.g., most nurses are female and most attorneys are male. Thus, gender pronouns and names can introduce spurious relations with the occupation prediction. To quantify this effect from the domain generalization perspective, we report the average of the worst accuracies with respect to the gender for each occupation (Worst per gender). Since both datasets are class-imbalanced, we also report balanced (by class) test accuracy (BA) on source to ensure that in-distribution performance remains reasonable.

Results

In Table 1, we compare methods for enforcing individual fairness with an ERM baseline. Individual fairness methods require a fair metric that encodes that changes in identity tokens result in similar comments in Toxicity, and changes in gender pronouns and names result in similar biographies in Bios (except for CLP which instead uses this intuition for data augmentation). We obtained the fair metric as in the original studies of the corresponding methods. We can observe that individual fairness methods consistently improve domain generalization metrics supporting our theoretical findings. They also tend to maintain reasonable in-distribution performance, supporting their overall applicability in practical use-cases where both in- and out-of-distribution performance is important.

3 Individual Fairness via Domain Adaptation

In the previous section, we established that it is possible to use IF regularizers for domain adaptation problems provided that the true underlying signal satisfies some smoothness conditions. In this section, we investigate the opposite direction, i.e., whether the techniques employed for domain adaptation can be leveraged to enforce individual fairness. Typical methods for domain adaptation primarily aim at finding a representation of the input sample , such that the source and the target distributions of are aligned. In other words, the goal is to make it hard to distinguish from . For example, Ganin et al. [ganin2016domain]

proposed the Domain Adversarial Neural Network (DANN) for learning

, such that the discriminator fails to discriminate between and . Shu et al. [shu2018dirtt]

assume that the target distribution is clustered with respect to the classes and consequently the optimal classifier should pass through the low density region. To promote this condition, they modify the previous objective 

[ganin2016domain] with additional regularizers to ensure that the final classifier (which is built on top of ) has low entropy on the target and is also locally Lipschitz. Sun et al. [sun2016return]

learn a linear transformation of the source distribution (which was later extended to learn non-linear transformations 

[sun2016deep]

), such that the first two moments of the transformed representations are the same under source and target distributions. Shen 

et al. [shen2018wasserstein] learn domain invariant representations by minimizing the Wasserstein distance between the distributions of source and target representations induced by .

A common underlying theme of all of the above methods is to find which has a similar distribution on both the source and the target. In this section, we show that learning this domain invariant map indeed enforces individual fairness under suitable choice of domains. We demonstrate this by the following factor model: suppose we want to achieve individual fairness against a binary protected attribute (say sex). We define two domains as two groups corresponding the protected attribute, e.g., the source domain may consist of all the observations corresponding to the males and the target domain may consist of all the observations corresponding to the females. We assume that the covariates follow a factor model structure:

for three independent random variables

where denotes the relevant attribute, denotes the protected attributes and is the noise. Therefore, according to our design:

(11)
(12)

In the following theorem, we establish that if we estimate some linear transformation (with being the ambient dimension of ) of such that and has same distribution, then . Therefore, ignores the direction corresponding to the protected attribute and consequently is an individually fair representation. Suppose the source and target distributions satisfy (11) and (12). If some linear transformation satisfies , then .

This theorem implies any classifier built on top of the linear representation will be individually fair because for any that share relevant attributes . The proof of the theorem can be found in the Appendix. The above theorem constitutes an example of how domain adaptation methods can be adapted to enforce individual fairness when the covariates follow a factor structure.

Bios Toxicity
BA PC BA PC
Baseline
DANN
VADA
WDA
SenSeI
Table 2: Enforcing individual fairness using domain adaptation methods. Means and standard deviations over 10 trials.

3.1 Empirical Results

In the experiments, our goal is to train individually fair models using methods popularized in the domain adaptation (DA) literature. We experiment with DANN [ganin2016domain], VADA [shu2018dirtt], and a variation of the Wasserstein-based DA (WDA) [shen2018wasserstein] discussed in Section 3. We present additional experimental details in Appendix E.

Datasets and Metrics

We consider the same two datasets as in our domain generalization experiments in Section 2.4. We use prediction consistency (PC) to quantify individual fairness following prior works studying these datasets [yurochkin2021sensei, petersen2021post]. For the Toxicity dataset, we modify identity tokens in the test comments and compute prediction consistency with respect to all 50 identity tokens [dixon2018measuring]. A pair of comments that only differ in an identity token, e.g., “gay” vs “straight”, are intuitively similar and should be assigned the same prediction to satisfy individual fairness. For the Bios dataset, we consider prediction consistency with respect to changes in gender pronouns and names. Such changes result in biographies that should be treated similarly.

In these experiments, we have one labeled training dataset, rather than labeled source and unlabeled target datasets typical for DA setting. As shown in Section 3, the key idea behind achieving individual fairness using DA techniques is to split the available train data into source and target domains such that aligning their representations pertains to the fairness goals. To this end, in the Bios dataset we split the train data into all-male and all-female biographies, and the Toxicity dataset we split into a domain with comments containing any of the aforementioned 50 identity tokens and a domain with comments without any identity tokens. The ERM baseline is trained on the complete training dataset without any splitting.

Results

We summarize the results in Table 2. Among the considered DA methods, WDA achieves best individual fairness improvements in terms of prediction consistency, while maintaining good balanced accuracy (BA). Comparing to a method designed for training individually fair models, SenSeI, prediction consistency of DA methods is worse; however, the subject understanding required to apply them is milder. Individual fairness methods require a problem-specific fair metric, which can be learned from the data, but even then requires user to define, e.g., groups of comparable samples [mukherjee2020Two]. The domain adaptation approach requires a fairness-related splitting of the train data. In our experiments we adopted straightforward data splitting strategies and demonstrated improvements over the baseline. More sophisticated data splitting approaches can help to achieve further individual fairness improvements.

We present additional experimental details in Appendix E.

4 Conclusion

In this paper, we showed that algorithms for enforcing individual fairness can help ML models adapt/generalize to new domains and vice versa. Viewed through the lens of algorithmic fairness, the results in Section 2 show that enforcing individual fairness can mitigate algorithmic biases caused by covariate shifts as long as the regression function satisfies individual fairness. This complements the recent results by Maity et al. [maity2021does], which show that enforcing group fairness can mitigate algorithmic biases caused by subpopulation shifts. On the other hand, compared to existing results on out-of-distribution accuracy of ML models, the results in Section 2 demonstrate the importance of inductive biases in helping models adapt to new domains.

Turning to the results in Section 3, our results establish a probabilistic connection between domain adaptation and individual fairness. As we saw, it is possible to enforce individual fairness by aligning the distributions of the features under a factor model. This factor model is implicit in some prior works on algorithmic fairness [bolukbasi2016man, mukherjee2020Two], but we are not aware of any results that show it is possible to enforce individual fairness using domain adaptation techniques.

Appendix A Appendix

a.1 Proof of Theorem 2.1

For the proof of this theorem, we need few auxiliary lemmas, which we state below: Define the extrapolation map as:

Then under our assumptions on :

  • is Lipschitz with Lipschitz constant .

  • For any vector we have:

Under Assumption 2.1 we have:

for any function . Furthermore, if and denotes the first and second partial derivative of respectively, then we have:

The proof of Lemma A.1 can be found in subsection B.1 and the proof of Lemma A.1 can be found in Section B.2. For the rest of the proof, we introduce some notations for the ease of presentation: for any two vector of the same dimension we use or its partial derivatives to denote the coordinate wise sum, i.e., . From the strong smoothness condition on we have:

(13)

We can further bound the first term on the RHS of the above equation as follows:

(14)

Combining the bounds on equation (A.1) and (A.1) we obtain:

(15)

The term can be bounded directly by Lemma A.1 as:

(16)
(17)

To bound , using we have:

Bound on follows from a similar line of argument: