Identifying and Correcting Label Bias in Machine Learning

01/15/2019 ∙ by Heinrich Jiang, et al. ∙ 0

Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning has become widely adopted in a variety of real-world applications that significantly affect people’s lives (Guimaraes and Tofighi, 2018; Guegan and Hassani, 2018). Fairness in these algorithmic decision-making systems has thus become an increasingly important concern: It has been shown that without appropriate intervention during training or evaluation, models can be biased against certain groups (Angwin et al., 2016; Hardt et al., 2016). This is due to the fact that the data used to train these models often contains biases that become reinforced into the model (Bolukbasi et al., 2016). Moreover, it has been shown that simple remedies, such as ignoring the features corresponding to the protected groups, are largely ineffective due to redundant encodings in the data (Pedreshi et al., 2008). In other words, the data can be inherently biased in possibly complex ways, thus making it difficult to achieve fairness.

Research on training fair classifiers has therefore received a great deal of attention. One such approach has focused on developing post-processing steps to enforce fairness on a learned model (Doherty et al., 2012; Feldman, 2015; Hardt et al., 2016). That is, one first trains a machine learning model, resulting in an unfair classifier. The outputs of the classifier are then calibrated to enforce fairness. Although this approach is likely to decrease the bias of the classifier, by decoupling the training from the fairness enforcement, this procedure may not lead to the best trade-off between fairness and accuracy. Accordingly, recent work has proposed to incorporate fairness into the training algorithm itself, framing the problem as a constrained optimization problem and subsequently applying the method of Lagrange multipliers to transform the constraints to penalties (Zafar et al., 2015; Goh et al., 2016; Cotter et al., 2018b; Agarwal et al., 2018); however such approaches may introduce undesired complexity and lead to more difficult or unstable training (Cotter et al., 2018b, c). Both of these existing methods address the problem of bias by adjusting the machine learning model rather than the data, despite the fact that oftentimes it is the training data itself – i.e., the observed features and corresponding labels – which are biased.

Figure 1: In our approach to training an unbiased, fair classifier, we assume the existence of a true but unknown label function which has been adjusted by a biased process to produce the labels observed in the training data. Our main contribution is providing a procedure that appropriately weights examples in the dataset, and then showing that training on the resulting loss corresponds to training on the original, true, unbiased labels.

In this paper, we provide an approach to machine learning fairness that addresses the underlying data bias problem directly. We introduce a new mathematical framework for fairness in which we assume that there exists an unknown but unbiased ground truth label function and that the labels observed in the data are assigned by an agent who is possibly biased, but otherwise has the intention of being accurate. This assumption is natural in practice and may also be applied to settings where the features themselves are biased and that the observed labels were generated by a process depending on the features (i.e. situations where there is bias in both the features and labels).

Based on this mathematical formulation, we show how one may identify the amount of bias in the training data as a closed form expression. Furthermore, our derived form for the bias suggests that its correction may be performed by assigning appropriate weights to each example in the training data. We show, with theoretical guarantees, that training the classifier under the resulting weighted objective leads to an unbiased classifier on the original un-weighted dataset. Notably, many pre-processing approaches and even constrained optimization approaches (e.g. Agarwal et al. (2018)) optimize a loss which possibly modifies the observed labels or features, and doing so may be legally prohibited as it can be interpreted as training on falsified data; see Barocas and Selbst (2016) (more details about this can be found in Section 6). In contrast, our method does not modify any of the observed labels or features. Rather, we correct for the bias by changing the distribution of the sample points via re-weighting the dataset.

Our resulting method is general and can be applied to various notions of fairness, including demographic parity, equal opportunity, equalized odds, and disparate impact. Moreover, the method is practical and simple to tune: With the appropriate example weights, any off-the-shelf classification procedure can be used on the weighted dataset to learn a fair classifier. Experimentally, we show that on standard fairness benchmark datasets and under a variety of fairness notions our method can outperform previous approaches to fair classification.

2 Background

In this section, we introduce our framework for machine learning fairness, which explicitly assumes an unknown and unbiased ground truth label function. We additionally introduce notation and definitions used in the subsequent presentation of our method.

2.1 Biased and Unbiased Labels

Consider a data domain and an associated data distribution . An element

may be interpreted as a feature vector associated with a specific example. We let

be the labels, considering the binary classification setting, although our method may be readily generalized to other settings.

We assume the existence of an unbiased, ground truth label function . Although is the assumed ground truth, in general we do not have access to it. Rather, our dataset has labels generated based on a biased label function . Accordingly, we assume that our data is drawn as follows:

and we assume access to a finite sample drawn from .

In a machine learning context, our directive is to use the dataset to recover the unbiased, true label function . In general, the relationship between the desired and the observed is unknown. Without additional assumptions, it is difficult to learn a machine learning model to fit . We will attack this problem in the following sections by proposing a minimal assumption on the relationship between and . The assumption will allow us to derive an expression for in terms of , and the form of this expression will immediately imply that correction of the label bias may be done by appropriately re-weighting the data. We note that our proposed perspective on the problem of learning a fair machine learning model is conceptually different from previous ones. While previous perspectives propose to train on the observed, biased labels and only enforce fairness as a penalty or as a post-processing step to the learning process, we take a more direct approach. Training on biased data can be inherently misguided, and thus we believe that our proposed perspective may be more appropriate and better aligned with the directives associated with machine learning fairness.

2.2 Notions of Bias

We now discuss precise ways in which can be biased. We describe a number of accepted notions of fairness; i.e., what it means for an arbitrary label function or machine learning model to be biased (unfair) or unbiased (fair).

We will define the notions of fairness in terms of a constraint function . Many of the common notions of fairness may be expressed or approximated as linear constraints on (introduced previously by Cotter et al. (2018c); Goh et al. (2016)). That is, they are of the form

where and we use the shorthand

to denote the probability of sampling

from a Bernoulli random variable with

; i.e., and . Therefore, a label function is unbiased with respect to the constraint function if . If is biased, the degree of bias (positive or negative) is given by .

We define the notions of fairness with respect to a protected group , and thus assume access to an indicator function . We use to denote the probability of a sample drawn from to be in . We use to denote the proportion of which is positively labelled and to denote the proportion of which is positively labelled and in . We now give some concrete examples of accepted notions of constraint functions:
Demographic parity (Dwork et al., 2012): A fair classifier should make positive predictions on at the same rate as on all of . The constraint function may be expressed as , .
Disparate impact (Feldman et al., 2015): This is identical to demographic parity, only that, in addition, during inference the classifier does not have access to the features of indicating whether it belongs to the protected group.
Equal opportunity (Hardt et al., 2016): A fair classifier should have equal true positive rates on as on all of . The constraint may be expressed as , .
Equalized odds (Hardt et al., 2016): A fair classifier should have equal true positive and false positive rates on as on all of . In addition to the constraint associated with equal opportunity, this notion applies an additional constraint with , .

In practice, there are often multiple fairness constraints associated with multiple protected groups . It is clear that our subsequent results will assume multiple fairness constraints and protected groups, and that the protected groups may have overlapping samples.

3 Modeling How Bias Arises in Data

We now introduce our underlying mathematical framework to understand bias in the data, by providing the relationship between and (Assumption 1 and Proposition 1). This will allow us to derive a closed form expression for in terms of (Corollary 1). In Section 4 we will show how this expression leads to a simple weighting procedure that uses data with biased labels to train a classifier with respect to the true, unbiased labels.

We begin with an assumption on the relationship between the observed and the underlying .

Assumption 1.

Suppose that our fairness constraints are , with respect to which is unbiased (i.e. for ). We assume that there exist such that the observed, biased label function is the solution of the following constrained optimization problem:

where we use to denote the KL-divergence.

In other words, we assume that is the label function closest to while achieving some amount of bias, where proximity to is given by the KL-divergence. This is a reasonable assumption in practice, where the observed data may be the result of manual labelling done by actors (e.g. human decision-makers) who strive to provide an accurate label while being affected by (potentially unconscious) biases; or in cases where the observed labels correspond to a process (e.g. results of a written exam) devised to be accurate and fair, but which is nevertheless affected by inherent biases.

We use the KL-divergence to impose this desire to have an accurate labelling. In general, a different divergence may be chosen. However in our case, the choice of a KL-divergence allows us to derive the following proposition, which provides a closed-form expression for the observed . The derivation of Proposition 1 from Assumption 1 is standard and has appeared in previous works; e.g. Friedlander and Gupta (2006); Botev and Kroese (2011). For completeness, we include the proof in the Appendix.

Proposition 1.

Suppose that Assumption 1 holds. Then satisfies the following for all and .

for some .

Given this form of in terms of , we can immediately deduce the form of in terms of :

Corollary 1.

Suppose that Assumption 1 holds. The unbiased label function is of the form,

for some .

We note that previous approaches to learning fair classifiers often formulate a constrained optimization problem similar to that appearing in Assumption 1 (i.e., maximize the accuracy or log-likelihood of a classifier subject to linear constraints) and subsequently solve it, usually via the method of Lagrange multipliers which translates the constraints to penalties on the training loss. In our approach, rather than using the constrained optimization problem to formulate a machine learning objective, we use it to express the relationship between true (unbiased) and observed (biased) labels. Furthermore, rather than training with respect to the biased labels, our approach aims to recover the true underlying labels. As we will show in the following sections, this may be done by simply optimizing the training loss on a re-weighting of the dataset. In contrast, the penalties associated with Lagrangian approaches can often be cumbersome: The original, non-differentiable, fairness constraints must be relaxed or approximated before conversion to penalties. Even then, the derivatives of these approximations may be near-zero for large regions of the domain, causing difficulties during training.

4 Learning Unbiased Labels

We have derived a closed form expression for the true, unbiased label function in terms of the observed label function , coefficients , and constraint functions . In this section, we elaborate on how one may learn a machine learning model to fit , given access to a dataset with labels sampled according to . We begin by restricting ourselves to constraints associated with demographic parity, allowing us to have full knowledge of these constraint functions. In Section 4.3 we will show how the same method may be extended to general notions of fairness.

With knowledge of the functions , it remains to determine the coefficients (which give us a closed form expression for the dataset weights) as well as the classifier . For simplicity, we present our method by first showing how a classifier may be learned assuming knowledge of the coefficients (Section 4.1). We subsequently show how the coefficients themselves may be learned, thus allowing our algorithm to be used in general setting (Section 4.2). Finally, we describe how to extend to more general notions of fairness (Section 4.3).

4.1 Learning Given

Although we have the closed form expression for the true label function, in practice we do not have access to the values but rather only access to data points with labels sampled from . We propose the weighting technique to train on labels based on .111See the Appendix for an alternative to the weighting technique – the sampling technique, based on a coin-flip. The weighting technique weights an example by the weight , where

We have the following theorem, which states that training a classifier on examples with biased labels weighted by is equivalent to training a classifier on examples labelled according to the true, unbiased labels.

Theorem 1.

For any loss function

, training a classifier on the weighted objective is equivalent to training the classifier on the objective with respect to the underlying, true labels, for some distribution over .


For a given and for any , due to Corollary 1 we have,


where depends only on . Therefore, letting denote the feature distribution , we have,


where , and this completes the proof. ∎

Theorem 1 is a core contribution of our work. It states that the bias in observed labels may be corrected in a simple and straightforward way: Just re-weight the training examples. We note that Theorem 1 suggests that when we re-weight the training examples, we trade off the ability to train on unbiased labels for training on a slightly different distribution over features . In Section 5, we will show that given some mild conditions, the change in feature distribution does not affect the bias of the final learned classifier. Therefore, in these cases, training with respect to weighted examples with biased labels is equivalent to training with respect to the same examples and the true labels.

4.2 Determining the Coefficients

We now continue to describe how to learn the coefficients . One advantage of our approach is that, in practice, is often small. Thus, we propose to iteratively learn the coefficients so that the final classifier satisfies the desired fairness constraints either on the training data or on a validation set. We first discuss how to do this for demographic parity and will discuss extensions to other notions of fairness in Section 4.3. See the full pseudocode for learning and in Algorithm 1.

Intuitively, the idea is that if the positive prediction rate for a protected class is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of and decrease the weights of the negatively labeled examples of , then this will encourage the classifier to increase its accuracy on the positively labeled examples in , while the accuracy on the negatively labeled examples of may fall. Either of these two events will cause the positive prediction rate on to increase, and thus bring closer to the true, unbiased label function.

Accordingly, Algorithm 1 works by iteratively performing the following steps: (1) evaluate the demographic parity constraints; (2) update the coefficients by subtracting the respective constraint violation multiplied by a fixed step-size; (3) compute the weights for each sample based on these multipliers using the closed-form provided by Proposition 1; and (4) retrain the classifier given these weights.

Algorithm 1 takes in a classification procedure , which given a dataset and weights , outputs a classifier. In practice,

can be any training procedure which minimizes a weighted loss function over some parametric function class (e.g. logistic regression).

Our resulting algorithm simultaneously minimizes the weighted loss and maximizes fairness via learning the coefficients, which may be interpreted as competing goals with different objective functions. Thus, it is a form of a non-zero-sum two-player game. The use of non-zero-sum two-player games in fairness was first proposed in Cotter et al. (2018b) for the Lagrangian approach.

  Inputs: Learning rate , number of loops , training data , classification procedure . constraints corresponding to protected groups .
  Initialize to and .
  for  do
     Let for .
     Update for .
     Let for
     Let if , otherwise for
  end for
Algorithm 1 Training a fair classifier for Demographic Parity, Disparate Impact, or Equal Opportunity.

4.3 Extension to Other Notions of Fairness

The initial restriction to demographic parity was made so that the values of the constraint functions on any would be known. We note that Algorithm 1 works for disparate impact as well: The only change would be that the classifier does not have access to the protected attributes. However, in other notions of fairness, such as equal opportunity or equalized odds, the constraint functions depend on , which is unknown.

For these cases, we propose to apply the same technique of iteratively re-weighting the loss to achieve the desired fairness notion, with the weights on each example determined only by the protected attribute and the observed label . This is equivalent to using Theorem 1 to derive the same procedure presented in Algorithm 1, but approximating the unknown constraint function as a piece-wise constant function , where is unknown. Although we do not have access to , we may treat as an additional set of parameters – one for each protected group attribute and each label . These additional parameters may be learned in the same way the coefficients are learned. In some cases, their values may be wrapped into the unknown coefficients. For example, for equal opportunity, there is in fact no need for any additional parameters. On the other hand, for equalized odds, the unknown values for and , are instead treated as unknown values for ; i.e., separate coefficients for positively and negatively labelled points. Due to space constraints, see the Appendix for further details on these and more general constraints.

5 Theoretical Analysis

In this section, we provide theoretical guarantees on a learned classifier using the weighting technique. We show that with the coefficients that satisfy Proposition 1

, training on the re-weighted dataset leads to a finite-sample non-parametric rates of consistency on the estimation error provided the classifier has sufficient flexibility.

We need to make the following regularity assumption on the data distribution, which assumes that the data is supported on a compact set in and is smooth (i.e. Lipschitz).

Assumption 2.

is a compact set over and both and are -Lipschitz (i.e. ).

We now give the result. The proof is technically involved and is deferred to the Appendix due to space.

Theorem 2 (Rates of Consistency).

Let . Let be a sample drawn from . Suppose that Assumptions 1 and 2 hold. Let be the set of all -Lipschitz functions mapping to . Suppose that the constraints are and the corresponding coefficients satisfy Proposition 1 where for and some . Let be the optimal function in under the weighted mean square error objective, where the weights satisfy Proposition 1. Then there exists depending on such that for sufficiently large depending on , we have with probability at least :

where .

Thus, with the appropriate values of ,…, given by Proposition 1, we see that training with the weighted dataset based on these values will guarantee that the final classifier will be close to . However, the above rate has a dependence on the dimension , which may be unattractive in high-dimensional settings. If the data lies on a -dimensional submanifold, then Theorem 3 below says that without any changes to the procedure, we will enjoy a rate that depends on the manifold dimension and independent of the ambient dimension. Interestingly, these rates are attained without knowledge of the manifold or its dimension.

Theorem 3 (Rates on Manifolds).

Suppose that all of the conditions of Theorem 2 hold and that in addition, is a -dimensional Riemannian submanifold of with finite volume and finite condition number. Then there exists depending on such that for sufficiently large depending on , we have with probability at least :

6 Related Work

Work in fair classification can be categorized into three approaches: post-processing of the outputs, the Lagrangian approach of transforming constraints to penalties, and pre-processing training data.

Post-processing: One approach to fairness is to perform a post-processing of the classifier outputs. Examples of previous work in this direction include Doherty et al. (2012); Feldman (2015); Hardt et al. (2016). However, this approach of calibrating the outputs to encourage fairness has limited flexibility. Pleiss et al. (2017) showed that a deterministic solution is only compatible with a single error constraint and thus cannot be applied to fairness notions such as equalized odds. Moreover, decoupling the training and calibration can lead to models with poor accuracy trade-off. In fact Woodworth et al. (2017) showed that in certain cases, post-processing can be provably suboptimal. Other works discussing the incompatibility of fairness notions include Chouldechova (2017); Kleinberg et al. (2016).

Lagrangian Approach: There has been much recent work done on enforcing fairness by transforming the constrained optimization problem via the method of Lagrange multipliers. Some works (Zafar et al., 2015; Goh et al., 2016) apply this to the convex setting. In the non-convex case, there is work which frames the constrained optimization problem as a two-player game (Kearns et al., 2017; Agarwal et al., 2018; Cotter et al., 2018b) . Related approaches include Edwards and Storkey (2015); Corbett-Davies et al. (2017); Narasimhan (2018). There is also recent work similar in spirit which encourages fairness by adding penalties to the objective; e.g, Donini et al. (2018) studies this for kernel methods and Komiyama et al. (2018) for linear models. However, the fairness constraints are often irregular and have to be relaxed in order to optimize. Notably, our method does not use the constraints directly in the model loss, and thus does not require them to be relaxed. Moreover, these approaches typically are not readily applicable to equality constraints as feasibility challenges can arise; thus, there is the added challenge of determining appropriate slack during training. Finally, the training can be difficult as Cotter et al. (2018c) has shown that the Lagrangian may not even have a solution to converge to.

When the classification loss and the relaxed constraints have the same form (e.g. a hinge loss as in Eban et al. (2017)), the resulting Lagrangian may be rewritten as a cost-sensitive classification, explicitly pointed out in Agarwal et al. (2018), who show that the Lagrangian method reduces to solving an objective of the form for some non-negative weights . In this setting, may not necessarily be the true label, which may occur for example in demographic parity when the goal is to predict more positively within a protected group and thus may be penalized for predicting correctly on negative examples. While this may be a reasonable approach to achieving fairness, it could be interpreted as training a weighted loss on modified labels, which may be legally prohibited (Barocas and Selbst, 2016). Our approach is a non-negative re-weighting of the original loss (i.e., does not modify the observed labels) and is thus simpler and more aligned with legal standards.

Pre-processing: This approach has primarily involved massaging the data to remove bias. Examples include Calders et al. (2009); Kamiran and Calders (2009); Žliobaite et al. (2011); Kamiran and Calders (2012); Zemel et al. (2013); Fish et al. (2015); Feldman et al. (2015); Beutel et al. (2017). Many of these approaches involve changing the labels and features of the training set, which may have legal implications since it is a form of training on falsified data (Barocas and Selbst, 2016). Moreover, these approaches typically do not perform as well as the state-of-art and have thus far come with few theoretical guarantees (Krasanakis et al., 2018). In contrast, our approach does not modify the training data and only re-weights the importance of certain sensitive groups. Our approach is also notably based on a mathematically grounded formulation of how the bias arises in the data.

7 Experiments

Dataset Metric Unc. Err. Unc. Vio. Cal. Err. Cal. Vio. Lagr. Err. Lagr. Vio. Our Err. Our Vio.
Bank Dem. Par. 9.41% .0349 9.70% .0068 10.46% .0126 9.63% .0056
Eq. Opp. 9.41% .1452 9.55% .0506 9.86% .1237 9.48% .0431
Eq. Odds 9.41% .1452 N/A N/A 9.61% .0879 9.50% .0376
Disp. Imp. 9.41% .0304 N/A N/A 10.44% .0135 9.89% .0063
COMPAS Dem. Par. 31.49% .2045 32.53% .0201 40.16% .0495 35.44% .0155
Eq. Opp. 31.49% .2373 31.63% .0256 36.92% .1141 33.63% .0774
Eq. Odds 31.49% .2373 N/A N/A 42.69% .0566 35.06% .0663
Disp. Imp. 31.21% .1362 N/A N/A 40.35% .0499 42.64% .0256
Communities Dem. Par. 11.62% .4211 32.06% .0653 28.46% .0519 30.06% .0107
Eq. Opp. 11.62% .5513 17.64% .0584 28.45% .0897 26.85% .0833
Eq. Odds 11.62% .5513 N/A N/A 28.46% .0962 26.65% .0769
Disp. Imp. 14.83% .3960 N/A N/A 28.26% .0557 30.26% .0073
German Stat. Dem. Par. 24.85% .0766 24.85% .0346 25.45% .0410 25.15% .0137
Eq. Opp. 24.85% .1120 24.54% .0922 27.27% .0757 25.45% .0662
Eq. Odds 24.85% .1120 N/A N/A 34.24% .1318 25.45% .1099
Disp. Imp. 24.85% .0608 N/A N/A 27.57% .0468 25.15% .0156
Adult Dem. Par. 14.15% .1173 16.60% .0129 20.47% .0198 16.51% .0037
Eq. Opp. 14.15% .1195 14.43% .0170 19.67% .0374 14.46% .0092
Eq. Odds 14.15% .1195 N/A N/A 19.04% .0160 14.58% .0221
Disp. Imp. 14.19% .1108 N/A N/A 20.48% .0199 17.37% .0334
Table 1: Experiment Results: Benchmark Fairness Tasks: Each row corresponds to a dataset and fairness notion. We show the accuracy and fairness violation of training with no constraints (Unc.), with post-processing calibration (Cal.), the Lagrangian approach (Lagr.) and our method. Bolded is the method achieving lowest fairness violation for each row. All reported numbers are evaluated on the test set.

7.1 Datasets

Bank Marketing (Lichman et al., 2013) ( examples). The data is based on a direct marketing campaign of a banking institution. The task is to predict whether someone will subscribe to a bank product. We use age as a protected attribute:

protected groups are determined based on uniform age quantiles.

Communities and Crime (Lichman et al., 2013) ( examples). Each datapoint represents a community and the task is to predict whether a community has high (above the -th percentile) crime rate. We pre-process the data consistent with previous works, e.g. Cotter et al. (2018a) and form the protected group based on race in the same way as done in Cotter et al. (2018a). We use four race features as real-valued protected attributes corresponding to percentage of White, Black, Asian and Hispanic. We threshold each at the median to form protected groups.

ProPublica’s COMPAS (ProPublica, 2018) Recidivism data ( examples). The task is to predict recidivism based on criminal history, jail and prison time, demographics, and risk scores. The protected groups are two race-based (Black, White) and two gender-based (Male, Female).

German Statlog Credit Data (Lichman et al., 2013) ( examples). The task is to predict whether an individual is a good or bad credit risk given attributes related to the individual’s financial situation. We form two protected groups based on an age cutoff of years.

Adult (Lichman et al., 2013) ( examples). The task is to predict whether the person’s income is more than k per year. We use protected groups based on gender (Male and Female) and race (Black and White). We follow an identical procedure to Zafar et al. (2015); Goh et al. (2016) to pre-process the dataset.

7.2 Baselines

For all of the methods except for the Lagrangian, we train using Scikit-Learn’s Logistic Regression (Pedregosa et al., 2011)

with default hyperparameter settings. We test our method against the unconstrained baseline, post-processing calibration, and the Lagrangian approach with hinge relaxation of the constraints. For all of the methods, we fix the hyperparameters across all experiments. For implementation details and hyperparameter settings, see the Appendix.

7.3 Fairness Notions

For each dataset and method, we evaluate our procedures with respect to demographic parity, equal opportunity, equalized odds, and disparate impact. As discussed earlier, the post-processing calibration method cannot be readily applied to disparate impact or equalized odds (without added complexity and randomized classifiers) so we do not show these results.

7.4 Results

Figure 2: Results as changes: We show test error and fairness violations for demographic parity on Adult as the weightings change. We take the optimal found by Algorithm 1. Then for each value on the -axis, we train a classifier with data weights based on the setting and plot the error and violations. We see that indeed, when , we train based on the found by Algorithm 1 and thus get the lowest fairness violation. On the other hand, corresponds to training on the unweighted dataset and gives us the lowest prediction error. Analogous charts for the rest of the datasets can be found in the Appendix.

We present the results in Table 1. We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods. We also include test error rates in the results. Although the primary objective of these algorithms is to yield a fair classifier, we find that our method is able to find reasonable trade-offs between fairness and accuracy. Our method often provides either better or comparative predictive error than the other fair classification methods (see Figure 2 for more insight into the trade-offs found by our algorithm).

The results in Table 1 also highlight the disadvantages of existing methods for training fair classifiers. Although the calibration method is an improvement over an unconstrained model, it is often unable to find a classifier with lowest bias.

We also find the results of the Lagrangian approach to not consistently provide fair classifiers. As noted in previous work (Cotter et al., 2018c; Goh et al., 2016), constrained optimization can be inherently unstable or requires a certain amount of slack in the objective as the constraints are typically relaxed to make gradient-based training possible and for feasibility purposes. Moreover, due to the added complexity of this method, it can overfit and have poor fairness generalization as noted in (Cotter et al., 2018a). Accordingly, we find that the Lagrangian method often yields poor trade-offs in fairness and accuracy, at times yielding classifiers with both worse accuracy and more bias.

8 MNIST with Label Bias

We now investigate the practicality of our method on a larger dataset. We take the MNIST dataset under the standard train/test split and then randomly select of the training data points and change their label to , yielding a biased set of labels. On such a dataset, our method should be able to find appropriate weights so that training on the weighted dataset roughly corresponds to training on the true labels. To this end, we train a classifier with a demographic-parity-like constraint on the predictions of digit ; i.e., we encourage a classifier to predict the digit at a rate of , the rate appearing in the true labels. We compare to the same baseline methods as before. See the Appendix for further experimental details.

Method Test Accuracy
Trained on True Labels 97.85%
Unconstrained 88.18%
Calibration 89.79%
Lagrangian 94.05%
Our Method 96.16%
Table 2: MNIST with Label Bias

We present the results in Table 2. We report test set accuracy computed with respect to the true labels. We find that our method is the only one that is able to approach the accuracy of a classifier trained with respect to the true labels. Compared to the Lagrangian approach or calibration, our method is able to improve error rate by over half. Even compared to the next best method (the Lagrangian), our proposed technique improves error rate by roughly 30%. These results give further evidence of the ability of our method to effectively train on the underlying, true labels despite only observing biased labels.

9 Conclusion

We presented a new framework to model how bias can arise in a dataset, assuming that there exists an unbiased ground truth. Our method for correcting for this bias is based on re-weighting the training examples. Given the appropriate weights, we showed with finite-sample guarantees that the learned classifier will be approximately unbiased. We gave practical procedures which approximate these weights and showed that the resulting algorithm leads to fair classifiers in a variety of settings.


We thank Maya Gupta, Andrew Cotter and Harikrishna Narasimhan for many insightful discussions and suggestions as well as Corinna Cortes for helpful comments.


Appendix A Sampling Technique

We present an alternative to the weighting technique. For the sampling technique, we note that the distribution corresponds to the conditional distribution,

where is a random variable sampled from and is a random variable sampled from the distribution . Therefore, in our training procedure for , given a data point , where is sampled according to (i.e., ), we sample a value from the random variable , and train on if and only if . This procedure corresponds to training on data points with sampled according to the true, unbiased label function .

The sampling technique ignores or skips data points when (i.e., when the sample from does not match the observed label). In cases where the cardinality of the labels is large, this technique may ignore a large number of examples, hampering training. For this reason, the weighting technique may be more practical.

Appendix B Algorithms for Other Notions of Fairness

Equal Opportunity: Algorithm 1 can be directly used by replacing the demographic parity constraints with equal opportunity constraints. Recall that in equal opportunity, the goal is for the positive prediction rates on the positive examples of the protected group to match that of the overall. If the positive prediction rate for positive examples is less than that of the overall, then Algorithm 1 will up-weight the examples of which are positively labeled. This encourages the classifier to be more accurate on the positively labeled examples of , which in other words means that it will encourage the classifier to increase its positive prediction rate on these examples, thus leading to a classifier satisfying equal opportunity. In this way, the same intuitions supporting the application of Algorithm 1 to demographic parity or disparate impact also support its application to equal opportunity. We note that in practice, we do not have access to the true labels function, so we approximate the constraint violation using the observed labels as .

  Inputs: Learning rate , number of loops , training data , classification procedure . True positive rate constraints and false positive rate constraints respectfully corresponding to protected groups .
  Initialize to and . Let
  for  do
     Let for and .
     Update for and .
      for .
      for .
     Let if , otherwise for
  end for
Algorithm 2 Training a fair classifier for Equalized Odds.

Equalized Odds: Recall that equalized odds requires that the conditions for equal opportunity (regarding the true positive rate) to be satisfied and in addition, the false positive rates for each protected group match the false positive rate of the overall. Thus, as before, for each true positive rate constraint, we see that if the examples of have a lower true positive rate than the overall, then up-weighting positively labeled examples in will encourage the classifier to increase its accuracy on the positively labeled examples of , thus increasing the true positive rate on . Likewise, if the examples of have a higher false positive rate than the overall, then up-weighting the negatively labeled examples of will encourage the classifier to be more accurate on the negatively labeled examples of , thus decreasing the false positive rate on . This forms the intuition behind Algorithm 2. We again approximate the constraint violation using the observed labels as for .

More general constraints: It is clear that our strategy can be further extended to any constraint that can be expressed as a function of the true positive rate and false positive rate over any subsets (i.e. protected groups) of the data. Examples that arise in practice include equal accuracy constraints, where the accuracy of certain subsets of the data must be approximately the same in order to not disadvantage certain groups, and high confidence samples, where there are a number of samples which the classifier ought to predict correctly and thus appropriate weighting can enforce that the classifier achieves high accuracy on these examples.

Appendix C Proof of Proposition 1

Proof of Proposition 1.

The constrained optimization problem stated in Assumption 1 is a convex optimization with linear constraints. We may use the Lagrangian method to transform it into the following min-max problem:

where is define as

Note that the KL-divergence may be written as an inner product:

Therefore, we have

In terms of , this is a classic convex optimization problem (Botev and Kroese, 2011). Its optimum is a Boltzmann distribution of the following form:

The desired claim immediately follows. ∎

Appendix D Proof of Theorem 2

Proof of Theorem 2.

The corresponding weight for each sample is if and if , where

. Then, we have by Proposition 1 that


Let us denote the weight for example . Suppose that is the optimal learner on the re-weighted objective. That is,

where and . Let us partition into a grid of -dimensional hypercubes with diameter , and let this collection be . For each , let us denote the center of as . Define

where .

We now show that . We have

where the first inequality holds by smoothness of ; the second inequality holds because the value of for each will be the same assuming that is chosen sufficiently small to not allow examples from different protected attributes to be in the same and then applying Bernstein’s concentration inequality so that this holds with probability at least for some constant ; finally the last inequality holds for sufficiently small. Similarly, we can show that , as desired.

It is clear that .

We now bound the amount can deviate from at on average. Let . Then, we have

because otherwise,

contradicting the fact that minimizes .

We thus have

where the last inequality follows by lower bounding in terms of .


By the smoothness of