1 Introduction
Machine learning has become widely adopted in a variety of realworld applications that significantly affect people’s lives (Guimaraes and Tofighi, 2018; Guegan and Hassani, 2018). Fairness in these algorithmic decisionmaking systems has thus become an increasingly important concern: It has been shown that without appropriate intervention during training or evaluation, models can be biased against certain groups (Angwin et al., 2016; Hardt et al., 2016). This is due to the fact that the data used to train these models often contains biases that become reinforced into the model (Bolukbasi et al., 2016). Moreover, it has been shown that simple remedies, such as ignoring the features corresponding to the protected groups, are largely ineffective due to redundant encodings in the data (Pedreshi et al., 2008). In other words, the data can be inherently biased in possibly complex ways, thus making it difficult to achieve fairness.
Research on training fair classifiers has therefore received a great deal of attention. One such approach has focused on developing postprocessing steps to enforce fairness on a learned model (Doherty et al., 2012; Feldman, 2015; Hardt et al., 2016). That is, one first trains a machine learning model, resulting in an unfair classifier. The outputs of the classifier are then calibrated to enforce fairness. Although this approach is likely to decrease the bias of the classifier, by decoupling the training from the fairness enforcement, this procedure may not lead to the best tradeoff between fairness and accuracy. Accordingly, recent work has proposed to incorporate fairness into the training algorithm itself, framing the problem as a constrained optimization problem and subsequently applying the method of Lagrange multipliers to transform the constraints to penalties (Zafar et al., 2015; Goh et al., 2016; Cotter et al., 2018b; Agarwal et al., 2018); however such approaches may introduce undesired complexity and lead to more difficult or unstable training (Cotter et al., 2018b, c). Both of these existing methods address the problem of bias by adjusting the machine learning model rather than the data, despite the fact that oftentimes it is the training data itself – i.e., the observed features and corresponding labels – which are biased.
In this paper, we provide an approach to machine learning fairness that addresses the underlying data bias problem directly. We introduce a new mathematical framework for fairness in which we assume that there exists an unknown but unbiased ground truth label function and that the labels observed in the data are assigned by an agent who is possibly biased, but otherwise has the intention of being accurate. This assumption is natural in practice and may also be applied to settings where the features themselves are biased and that the observed labels were generated by a process depending on the features (i.e. situations where there is bias in both the features and labels).
Based on this mathematical formulation, we show how one may identify the amount of bias in the training data as a closed form expression. Furthermore, our derived form for the bias suggests that its correction may be performed by assigning appropriate weights to each example in the training data. We show, with theoretical guarantees, that training the classifier under the resulting weighted objective leads to an unbiased classifier on the original unweighted dataset. Notably, many preprocessing approaches and even constrained optimization approaches (e.g. Agarwal et al. (2018)) optimize a loss which possibly modifies the observed labels or features, and doing so may be legally prohibited as it can be interpreted as training on falsified data; see Barocas and Selbst (2016) (more details about this can be found in Section 6). In contrast, our method does not modify any of the observed labels or features. Rather, we correct for the bias by changing the distribution of the sample points via reweighting the dataset.
Our resulting method is general and can be applied to various notions of fairness, including demographic parity, equal opportunity, equalized odds, and disparate impact. Moreover, the method is practical and simple to tune: With the appropriate example weights, any offtheshelf classification procedure can be used on the weighted dataset to learn a fair classifier. Experimentally, we show that on standard fairness benchmark datasets and under a variety of fairness notions our method can outperform previous approaches to fair classification.
2 Background
In this section, we introduce our framework for machine learning fairness, which explicitly assumes an unknown and unbiased ground truth label function. We additionally introduce notation and definitions used in the subsequent presentation of our method.
2.1 Biased and Unbiased Labels
Consider a data domain and an associated data distribution . An element
may be interpreted as a feature vector associated with a specific example. We let
be the labels, considering the binary classification setting, although our method may be readily generalized to other settings.We assume the existence of an unbiased, ground truth label function . Although is the assumed ground truth, in general we do not have access to it. Rather, our dataset has labels generated based on a biased label function . Accordingly, we assume that our data is drawn as follows:
and we assume access to a finite sample drawn from .
In a machine learning context, our directive is to use the dataset to recover the unbiased, true label function . In general, the relationship between the desired and the observed is unknown. Without additional assumptions, it is difficult to learn a machine learning model to fit . We will attack this problem in the following sections by proposing a minimal assumption on the relationship between and . The assumption will allow us to derive an expression for in terms of , and the form of this expression will immediately imply that correction of the label bias may be done by appropriately reweighting the data. We note that our proposed perspective on the problem of learning a fair machine learning model is conceptually different from previous ones. While previous perspectives propose to train on the observed, biased labels and only enforce fairness as a penalty or as a postprocessing step to the learning process, we take a more direct approach. Training on biased data can be inherently misguided, and thus we believe that our proposed perspective may be more appropriate and better aligned with the directives associated with machine learning fairness.
2.2 Notions of Bias
We now discuss precise ways in which can be biased. We describe a number of accepted notions of fairness; i.e., what it means for an arbitrary label function or machine learning model to be biased (unfair) or unbiased (fair).
We will define the notions of fairness in terms of a constraint function . Many of the common notions of fairness may be expressed or approximated as linear constraints on (introduced previously by Cotter et al. (2018c); Goh et al. (2016)). That is, they are of the form
where and we use the shorthand
to denote the probability of sampling
from a Bernoulli random variable with
; i.e., and . Therefore, a label function is unbiased with respect to the constraint function if . If is biased, the degree of bias (positive or negative) is given by .We define the notions of fairness with respect to a protected group , and thus assume access to an indicator function .
We use to denote the probability of a sample drawn from to be in .
We use
to denote the proportion of which is positively labelled and
to denote the proportion of which is positively labelled and in . We now give some concrete examples of accepted notions of constraint functions:
Demographic parity (Dwork et al., 2012): A fair classifier should make positive predictions on at the same rate as on all of . The constraint function may be expressed as , .
Disparate impact (Feldman et al., 2015): This is identical to demographic parity, only that, in addition, during inference the classifier does not have access to the features of indicating whether it belongs to the protected group.
Equal opportunity (Hardt et al., 2016): A fair classifier should have equal true positive rates on as on all of . The constraint may be expressed as , .
Equalized odds (Hardt et al., 2016): A fair classifier should have equal true positive and false positive rates on as on all of .
In addition to the constraint associated with equal opportunity, this notion applies an additional constraint with , .
In practice, there are often multiple fairness constraints associated with multiple protected groups . It is clear that our subsequent results will assume multiple fairness constraints and protected groups, and that the protected groups may have overlapping samples.
3 Modeling How Bias Arises in Data
We now introduce our underlying mathematical framework to understand bias in the data, by providing the relationship between and (Assumption 1 and Proposition 1). This will allow us to derive a closed form expression for in terms of (Corollary 1). In Section 4 we will show how this expression leads to a simple weighting procedure that uses data with biased labels to train a classifier with respect to the true, unbiased labels.
We begin with an assumption on the relationship between the observed and the underlying .
Assumption 1.
Suppose that our fairness constraints are , with respect to which is unbiased (i.e. for ). We assume that there exist such that the observed, biased label function is the solution of the following constrained optimization problem:
where we use to denote the KLdivergence.
In other words, we assume that is the label function closest to while achieving some amount of bias, where proximity to is given by the KLdivergence. This is a reasonable assumption in practice, where the observed data may be the result of manual labelling done by actors (e.g. human decisionmakers) who strive to provide an accurate label while being affected by (potentially unconscious) biases; or in cases where the observed labels correspond to a process (e.g. results of a written exam) devised to be accurate and fair, but which is nevertheless affected by inherent biases.
We use the KLdivergence to impose this desire to have an accurate labelling. In general, a different divergence may be chosen. However in our case, the choice of a KLdivergence allows us to derive the following proposition, which provides a closedform expression for the observed . The derivation of Proposition 1 from Assumption 1 is standard and has appeared in previous works; e.g. Friedlander and Gupta (2006); Botev and Kroese (2011). For completeness, we include the proof in the Appendix.
Proposition 1.
Given this form of in terms of , we can immediately deduce the form of in terms of :
Corollary 1.
We note that previous approaches to learning fair classifiers often formulate a constrained optimization problem similar to that appearing in Assumption 1 (i.e., maximize the accuracy or loglikelihood of a classifier subject to linear constraints) and subsequently solve it, usually via the method of Lagrange multipliers which translates the constraints to penalties on the training loss. In our approach, rather than using the constrained optimization problem to formulate a machine learning objective, we use it to express the relationship between true (unbiased) and observed (biased) labels. Furthermore, rather than training with respect to the biased labels, our approach aims to recover the true underlying labels. As we will show in the following sections, this may be done by simply optimizing the training loss on a reweighting of the dataset. In contrast, the penalties associated with Lagrangian approaches can often be cumbersome: The original, nondifferentiable, fairness constraints must be relaxed or approximated before conversion to penalties. Even then, the derivatives of these approximations may be nearzero for large regions of the domain, causing difficulties during training.
4 Learning Unbiased Labels
We have derived a closed form expression for the true, unbiased label function in terms of the observed label function , coefficients , and constraint functions . In this section, we elaborate on how one may learn a machine learning model to fit , given access to a dataset with labels sampled according to . We begin by restricting ourselves to constraints associated with demographic parity, allowing us to have full knowledge of these constraint functions. In Section 4.3 we will show how the same method may be extended to general notions of fairness.
With knowledge of the functions , it remains to determine the coefficients (which give us a closed form expression for the dataset weights) as well as the classifier . For simplicity, we present our method by first showing how a classifier may be learned assuming knowledge of the coefficients (Section 4.1). We subsequently show how the coefficients themselves may be learned, thus allowing our algorithm to be used in general setting (Section 4.2). Finally, we describe how to extend to more general notions of fairness (Section 4.3).
4.1 Learning Given
Although we have the closed form expression for the true label function, in practice we do not have access to the values but rather only access to data points with labels sampled from . We propose the weighting technique to train on labels based on .^{1}^{1}1See the Appendix for an alternative to the weighting technique – the sampling technique, based on a coinflip. The weighting technique weights an example by the weight , where
We have the following theorem, which states that training a classifier on examples with biased labels weighted by is equivalent to training a classifier on examples labelled according to the true, unbiased labels.
Theorem 1.
For any loss function
, training a classifier on the weighted objective is equivalent to training the classifier on the objective with respect to the underlying, true labels, for some distribution over .Proof.
For a given and for any , due to Corollary 1 we have,
(1) 
where depends only on . Therefore, letting denote the feature distribution , we have,
(2) 
where , and this completes the proof. ∎
Theorem 1 is a core contribution of our work. It states that the bias in observed labels may be corrected in a simple and straightforward way: Just reweight the training examples. We note that Theorem 1 suggests that when we reweight the training examples, we trade off the ability to train on unbiased labels for training on a slightly different distribution over features . In Section 5, we will show that given some mild conditions, the change in feature distribution does not affect the bias of the final learned classifier. Therefore, in these cases, training with respect to weighted examples with biased labels is equivalent to training with respect to the same examples and the true labels.
4.2 Determining the Coefficients
We now continue to describe how to learn the coefficients . One advantage of our approach is that, in practice, is often small. Thus, we propose to iteratively learn the coefficients so that the final classifier satisfies the desired fairness constraints either on the training data or on a validation set. We first discuss how to do this for demographic parity and will discuss extensions to other notions of fairness in Section 4.3. See the full pseudocode for learning and in Algorithm 1.
Intuitively, the idea is that if the positive prediction rate for a protected class is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of and decrease the weights of the negatively labeled examples of , then this will encourage the classifier to increase its accuracy on the positively labeled examples in , while the accuracy on the negatively labeled examples of may fall. Either of these two events will cause the positive prediction rate on to increase, and thus bring closer to the true, unbiased label function.
Accordingly, Algorithm 1 works by iteratively performing the following steps: (1) evaluate the demographic parity constraints; (2) update the coefficients by subtracting the respective constraint violation multiplied by a fixed stepsize; (3) compute the weights for each sample based on these multipliers using the closedform provided by Proposition 1; and (4) retrain the classifier given these weights.
Algorithm 1 takes in a classification procedure , which given a dataset and weights , outputs a classifier. In practice,
can be any training procedure which minimizes a weighted loss function over some parametric function class (e.g. logistic regression).
Our resulting algorithm simultaneously minimizes the weighted loss and maximizes fairness via learning the coefficients, which may be interpreted as competing goals with different objective functions. Thus, it is a form of a nonzerosum twoplayer game. The use of nonzerosum twoplayer games in fairness was first proposed in Cotter et al. (2018b) for the Lagrangian approach.
4.3 Extension to Other Notions of Fairness
The initial restriction to demographic parity was made so that the values of the constraint functions on any would be known. We note that Algorithm 1 works for disparate impact as well: The only change would be that the classifier does not have access to the protected attributes. However, in other notions of fairness, such as equal opportunity or equalized odds, the constraint functions depend on , which is unknown.
For these cases, we propose to apply the same technique of iteratively reweighting the loss to achieve the desired fairness notion, with the weights on each example determined only by the protected attribute and the observed label . This is equivalent to using Theorem 1 to derive the same procedure presented in Algorithm 1, but approximating the unknown constraint function as a piecewise constant function , where is unknown. Although we do not have access to , we may treat as an additional set of parameters – one for each protected group attribute and each label . These additional parameters may be learned in the same way the coefficients are learned. In some cases, their values may be wrapped into the unknown coefficients. For example, for equal opportunity, there is in fact no need for any additional parameters. On the other hand, for equalized odds, the unknown values for and , are instead treated as unknown values for ; i.e., separate coefficients for positively and negatively labelled points. Due to space constraints, see the Appendix for further details on these and more general constraints.
5 Theoretical Analysis
In this section, we provide theoretical guarantees on a learned classifier using the weighting technique. We show that with the coefficients that satisfy Proposition 1
, training on the reweighted dataset leads to a finitesample nonparametric rates of consistency on the estimation error provided the classifier has sufficient flexibility.
We need to make the following regularity assumption on the data distribution, which assumes that the data is supported on a compact set in and is smooth (i.e. Lipschitz).
Assumption 2.
is a compact set over and both and are Lipschitz (i.e. ).
We now give the result. The proof is technically involved and is deferred to the Appendix due to space.
Theorem 2 (Rates of Consistency).
Let . Let be a sample drawn from . Suppose that Assumptions 1 and 2 hold. Let be the set of all Lipschitz functions mapping to . Suppose that the constraints are and the corresponding coefficients satisfy Proposition 1 where for and some . Let be the optimal function in under the weighted mean square error objective, where the weights satisfy Proposition 1. Then there exists depending on such that for sufficiently large depending on , we have with probability at least :
where .
Thus, with the appropriate values of ,…, given by Proposition 1, we see that training with the weighted dataset based on these values will guarantee that the final classifier will be close to . However, the above rate has a dependence on the dimension , which may be unattractive in highdimensional settings. If the data lies on a dimensional submanifold, then Theorem 3 below says that without any changes to the procedure, we will enjoy a rate that depends on the manifold dimension and independent of the ambient dimension. Interestingly, these rates are attained without knowledge of the manifold or its dimension.
Theorem 3 (Rates on Manifolds).
Suppose that all of the conditions of Theorem 2 hold and that in addition, is a dimensional Riemannian submanifold of with finite volume and finite condition number. Then there exists depending on such that for sufficiently large depending on , we have with probability at least :
6 Related Work
Work in fair classification can be categorized into three approaches: postprocessing of the outputs, the Lagrangian approach of transforming constraints to penalties, and preprocessing training data.
Postprocessing: One approach to fairness is to perform a postprocessing of the classifier outputs. Examples of previous work in this direction include Doherty et al. (2012); Feldman (2015); Hardt et al. (2016). However, this approach of calibrating the outputs to encourage fairness has limited flexibility. Pleiss et al. (2017) showed that a deterministic solution is only compatible with a single error constraint and thus cannot be applied to fairness notions such as equalized odds. Moreover, decoupling the training and calibration can lead to models with poor accuracy tradeoff. In fact Woodworth et al. (2017) showed that in certain cases, postprocessing can be provably suboptimal. Other works discussing the incompatibility of fairness notions include Chouldechova (2017); Kleinberg et al. (2016).
Lagrangian Approach: There has been much recent work done on enforcing fairness by transforming the constrained optimization problem via the method of Lagrange multipliers. Some works (Zafar et al., 2015; Goh et al., 2016) apply this to the convex setting. In the nonconvex case, there is work which frames the constrained optimization problem as a twoplayer game (Kearns et al., 2017; Agarwal et al., 2018; Cotter et al., 2018b) . Related approaches include Edwards and Storkey (2015); CorbettDavies et al. (2017); Narasimhan (2018). There is also recent work similar in spirit which encourages fairness by adding penalties to the objective; e.g, Donini et al. (2018) studies this for kernel methods and Komiyama et al. (2018) for linear models. However, the fairness constraints are often irregular and have to be relaxed in order to optimize. Notably, our method does not use the constraints directly in the model loss, and thus does not require them to be relaxed. Moreover, these approaches typically are not readily applicable to equality constraints as feasibility challenges can arise; thus, there is the added challenge of determining appropriate slack during training. Finally, the training can be difficult as Cotter et al. (2018c) has shown that the Lagrangian may not even have a solution to converge to.
When the classification loss and the relaxed constraints have the same form (e.g. a hinge loss as in Eban et al. (2017)), the resulting Lagrangian may be rewritten as a costsensitive classification, explicitly pointed out in Agarwal et al. (2018), who show that the Lagrangian method reduces to solving an objective of the form for some nonnegative weights . In this setting, may not necessarily be the true label, which may occur for example in demographic parity when the goal is to predict more positively within a protected group and thus may be penalized for predicting correctly on negative examples. While this may be a reasonable approach to achieving fairness, it could be interpreted as training a weighted loss on modified labels, which may be legally prohibited (Barocas and Selbst, 2016). Our approach is a nonnegative reweighting of the original loss (i.e., does not modify the observed labels) and is thus simpler and more aligned with legal standards.
Preprocessing: This approach has primarily involved massaging the data to remove bias. Examples include Calders et al. (2009); Kamiran and Calders (2009); Žliobaite et al. (2011); Kamiran and Calders (2012); Zemel et al. (2013); Fish et al. (2015); Feldman et al. (2015); Beutel et al. (2017). Many of these approaches involve changing the labels and features of the training set, which may have legal implications since it is a form of training on falsified data (Barocas and Selbst, 2016). Moreover, these approaches typically do not perform as well as the stateofart and have thus far come with few theoretical guarantees (Krasanakis et al., 2018). In contrast, our approach does not modify the training data and only reweights the importance of certain sensitive groups. Our approach is also notably based on a mathematically grounded formulation of how the bias arises in the data.
7 Experiments
Dataset  Metric  Unc. Err.  Unc. Vio.  Cal. Err.  Cal. Vio.  Lagr. Err.  Lagr. Vio.  Our Err.  Our Vio. 

Bank  Dem. Par.  9.41%  .0349  9.70%  .0068  10.46%  .0126  9.63%  .0056 
Eq. Opp.  9.41%  .1452  9.55%  .0506  9.86%  .1237  9.48%  .0431  
Eq. Odds  9.41%  .1452  N/A  N/A  9.61%  .0879  9.50%  .0376  
Disp. Imp.  9.41%  .0304  N/A  N/A  10.44%  .0135  9.89%  .0063  
COMPAS  Dem. Par.  31.49%  .2045  32.53%  .0201  40.16%  .0495  35.44%  .0155 
Eq. Opp.  31.49%  .2373  31.63%  .0256  36.92%  .1141  33.63%  .0774  
Eq. Odds  31.49%  .2373  N/A  N/A  42.69%  .0566  35.06%  .0663  
Disp. Imp.  31.21%  .1362  N/A  N/A  40.35%  .0499  42.64%  .0256  
Communities  Dem. Par.  11.62%  .4211  32.06%  .0653  28.46%  .0519  30.06%  .0107 
Eq. Opp.  11.62%  .5513  17.64%  .0584  28.45%  .0897  26.85%  .0833  
Eq. Odds  11.62%  .5513  N/A  N/A  28.46%  .0962  26.65%  .0769  
Disp. Imp.  14.83%  .3960  N/A  N/A  28.26%  .0557  30.26%  .0073  
German Stat.  Dem. Par.  24.85%  .0766  24.85%  .0346  25.45%  .0410  25.15%  .0137 
Eq. Opp.  24.85%  .1120  24.54%  .0922  27.27%  .0757  25.45%  .0662  
Eq. Odds  24.85%  .1120  N/A  N/A  34.24%  .1318  25.45%  .1099  
Disp. Imp.  24.85%  .0608  N/A  N/A  27.57%  .0468  25.15%  .0156  
Adult  Dem. Par.  14.15%  .1173  16.60%  .0129  20.47%  .0198  16.51%  .0037 
Eq. Opp.  14.15%  .1195  14.43%  .0170  19.67%  .0374  14.46%  .0092  
Eq. Odds  14.15%  .1195  N/A  N/A  19.04%  .0160  14.58%  .0221  
Disp. Imp.  14.19%  .1108  N/A  N/A  20.48%  .0199  17.37%  .0334 
7.1 Datasets
Bank Marketing (Lichman et al., 2013) ( examples). The data is based on a direct marketing campaign of a banking institution. The task is to predict whether someone will subscribe to a bank product. We use age as a protected attribute:
protected groups are determined based on uniform age quantiles.
Communities and Crime (Lichman et al., 2013) ( examples). Each datapoint represents a community and the task is to predict whether a community has high (above the th percentile) crime rate. We preprocess the data consistent with previous works, e.g. Cotter et al. (2018a) and form the protected group based on race in the same way as done in Cotter et al. (2018a). We use four race features as realvalued protected attributes corresponding to percentage of White, Black, Asian and Hispanic. We threshold each at the median to form protected groups.
ProPublica’s COMPAS (ProPublica, 2018) Recidivism data ( examples). The task is to predict recidivism based on criminal history, jail and prison time, demographics, and risk scores. The protected groups are two racebased (Black, White) and two genderbased (Male, Female).
German Statlog Credit Data (Lichman et al., 2013) ( examples). The task is to predict whether an individual is a good or bad credit risk given attributes related to the individual’s financial situation. We form two protected groups based on an age cutoff of years.
Adult (Lichman et al., 2013) ( examples). The task is to predict whether the person’s income is more than k per year. We use protected groups based on gender (Male and Female) and race (Black and White). We follow an identical procedure to Zafar et al. (2015); Goh et al. (2016) to preprocess the dataset.
7.2 Baselines
For all of the methods except for the Lagrangian, we train using ScikitLearn’s Logistic Regression (Pedregosa et al., 2011)
with default hyperparameter settings. We test our method against the unconstrained baseline, postprocessing calibration, and the Lagrangian approach with hinge relaxation of the constraints. For all of the methods, we fix the hyperparameters across all experiments. For implementation details and hyperparameter settings, see the Appendix.
7.3 Fairness Notions
For each dataset and method, we evaluate our procedures with respect to demographic parity, equal opportunity, equalized odds, and disparate impact. As discussed earlier, the postprocessing calibration method cannot be readily applied to disparate impact or equalized odds (without added complexity and randomized classifiers) so we do not show these results.
7.4 Results
We present the results in Table 1. We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods. We also include test error rates in the results. Although the primary objective of these algorithms is to yield a fair classifier, we find that our method is able to find reasonable tradeoffs between fairness and accuracy. Our method often provides either better or comparative predictive error than the other fair classification methods (see Figure 2 for more insight into the tradeoffs found by our algorithm).
The results in Table 1 also highlight the disadvantages of existing methods for training fair classifiers. Although the calibration method is an improvement over an unconstrained model, it is often unable to find a classifier with lowest bias.
We also find the results of the Lagrangian approach to not consistently provide fair classifiers. As noted in previous work (Cotter et al., 2018c; Goh et al., 2016), constrained optimization can be inherently unstable or requires a certain amount of slack in the objective as the constraints are typically relaxed to make gradientbased training possible and for feasibility purposes. Moreover, due to the added complexity of this method, it can overfit and have poor fairness generalization as noted in (Cotter et al., 2018a). Accordingly, we find that the Lagrangian method often yields poor tradeoffs in fairness and accuracy, at times yielding classifiers with both worse accuracy and more bias.
8 MNIST with Label Bias
We now investigate the practicality of our method on a larger dataset. We take the MNIST dataset under the standard train/test split and then randomly select of the training data points and change their label to , yielding a biased set of labels. On such a dataset, our method should be able to find appropriate weights so that training on the weighted dataset roughly corresponds to training on the true labels. To this end, we train a classifier with a demographicparitylike constraint on the predictions of digit ; i.e., we encourage a classifier to predict the digit at a rate of , the rate appearing in the true labels. We compare to the same baseline methods as before. See the Appendix for further experimental details.
Method  Test Accuracy 

Trained on True Labels  97.85% 
Unconstrained  88.18% 
Calibration  89.79% 
Lagrangian  94.05% 
Our Method  96.16% 
We present the results in Table 2. We report test set accuracy computed with respect to the true labels. We find that our method is the only one that is able to approach the accuracy of a classifier trained with respect to the true labels. Compared to the Lagrangian approach or calibration, our method is able to improve error rate by over half. Even compared to the next best method (the Lagrangian), our proposed technique improves error rate by roughly 30%. These results give further evidence of the ability of our method to effectively train on the underlying, true labels despite only observing biased labels.
9 Conclusion
We presented a new framework to model how bias can arise in a dataset, assuming that there exists an unbiased ground truth. Our method for correcting for this bias is based on reweighting the training examples. Given the appropriate weights, we showed with finitesample guarantees that the learned classifier will be approximately unbiased. We gave practical procedures which approximate these weights and showed that the resulting algorithm leads to fair classifiers in a variety of settings.
Acknowledgements
We thank Maya Gupta, Andrew Cotter and Harikrishna Narasimhan for many insightful discussions and suggestions as well as Corinna Cortes for helpful comments.
References
 Agarwal et al. (2018) Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna Wallach. A reductions approach to fair classification. arXiv preprint arXiv:1803.02453, 2018.
 Angwin et al. (2016) Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. https://www.propublica.org/article/machinebiasriskassessmentsin/criminalsentencing, May 2016. (Accessed on 07/18/2018).
 Balakrishnan et al. (2013) Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman. Cluster trees on manifolds. In Advances in Neural Information Processing Systems, pages 2679–2687, 2013.
 Barocas and Selbst (2016) Solon Barocas and Andrew D Selbst. Big data’s disparate impact. Cal. L. Rev., 104:671, 2016.
 Beutel et al. (2017) Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.
 Bolukbasi et al. (2016) Tolga Bolukbasi, KaiWei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pages 4349–4357, 2016.
 Botev and Kroese (2011) Zdravko I Botev and Dirk P Kroese. The generalized cross entropy method, with applications to probability density estimation. Methodology and Computing in Applied Probability, 13(1):1–27, 2011.
 Calders et al. (2009) Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Data mining workshops, 2009. ICDMW’09. IEEE international conference on, pages 13–18. IEEE, 2009.
 Chouldechova (2017) Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.
 CorbettDavies et al. (2017) Sam CorbettDavies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806. ACM, 2017.
 Cotter et al. (2018a) Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, and Seungil You. Training wellgeneralizing classifiers for fairness metrics and other datadependent constraints. arXiv preprint arXiv:1807.00028, 2018a.
 Cotter et al. (2018b) Andrew Cotter, Heinrich Jiang, and Karthik Sridharan. Twoplayer games for efficient nonconvex constrained optimization. arXiv preprint arXiv:1804.06500, 2018b.
 Cotter et al. (2018c) Andrew Cotter, Heinrich Jiang, Serena Wang, Taman Narayan, Maya Gupta, Seungil You, and Karthik Sridharan. Optimization with nondifferentiable constraints with applications to fairness, recall, churn, and other goals. arXiv preprint arXiv:1809.04198, 2018c.
 Doherty et al. (2012) Neil A Doherty, Anastasia V Kartasheva, and Richard D Phillips. Information effect of entry into credit ratings market: The case of insurers’ ratings. Journal of Financial Economics, 106(2):308–330, 2012.
 Donini et al. (2018) Michele Donini, Luca Oneto, Shai BenDavid, John ShaweTaylor, and Massimiliano Pontil. Empirical risk minimization under fairness constraints. arXiv preprint arXiv:1802.08626, 2018.
 Dwork et al. (2012) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.
 Eban et al. (2017) Elad Eban, Mariano Schain, Alan Mackey, Ariel Gordon, Ryan Rifkin, and Gal Elidan. Scalable learning of nondecomposable objectives. In Artificial Intelligence and Statistics, pages 832–840, 2017.
 Edwards and Storkey (2015) Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015.
 Feldman (2015) Michael Feldman. Computational fairness: Preventing machinelearned discrimination. 2015.
 Feldman et al. (2015) Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259–268. ACM, 2015.
 Fish et al. (2015) Benjamin Fish, Jeremy Kun, and Adám D Lelkes. Fair boosting: a case study. In Workshop on Fairness, Accountability, and Transparency in Machine Learning. Citeseer, 2015.
 Friedlander and Gupta (2006) Michael P Friedlander and Maya R Gupta. On minimizing distortion and relative entropy. IEEE Transactions on Information Theory, 52(1):238–245, 2006.
 Goh et al. (2016) Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. Satisfying realworld goals with dataset constraints. In Advances in Neural Information Processing Systems, pages 2415–2423, 2016.

Guegan and Hassani (2018)
Dominique Guegan and Bertrand Hassani.
Regulatory learning: how to supervise machine learning models? an
application to credit scoring.
The Journal of Finance and Data Science
, 2018.  Guimaraes and Tofighi (2018) Abel Ag Rb Guimaraes and Ghassem Tofighi. Detecting zones and threat on 3d body in security airports using deep learning machine. arXiv preprint arXiv:1802.00565, 2018.

Hardt et al. (2016)
Moritz Hardt, Eric Price, Nati Srebro, et al.
Equality of opportunity in supervised learning.
In Advances in neural information processing systems, pages 3315–3323, 2016.  Jiang (2017) Heinrich Jiang. Density level set estimation on manifolds with dbscan. In International Conference on Machine Learning, pages 1684–1693, 2017.
 Kamiran and Calders (2009) Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1–6. IEEE, 2009.
 Kamiran and Calders (2012) Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.
 Kearns et al. (2017) Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144, 2017.
 Kleinberg et al. (2016) Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent tradeoffs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807, 2016.
 Komiyama et al. (2018) Junpei Komiyama, Akiko Takeda, Junya Honda, and Hajime Shimao. Nonconvex optimization for regression with fairness constraints. In International Conference on Machine Learning, pages 2742–2751, 2018.
 Krasanakis et al. (2018) Emmanouil Krasanakis, Eleftherios SpyromitrosXioufis, Symeon Papadopoulos, and Yiannis Kompatsiaris. Adaptive sensitive reweighting to mitigate bias in fairnessaware classification. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 853–862. International World Wide Web Conferences Steering Committee, 2018.
 Lichman et al. (2013) Moshe Lichman et al. Uci machine learning repository, 2013.
 Narasimhan (2018) Harikrishna Narasimhan. Learning with complex loss functions and constraints. In International Conference on Artificial Intelligence and Statistics, pages 1646–1654, 2018.
 Niyogi et al. (2008) Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high confidence from random samples. Discrete & Computational Geometry, 39(13):419–441, 2008.
 Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikitlearn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
 Pedreshi et al. (2008) Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discriminationaware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 560–568. ACM, 2008.
 Pleiss et al. (2017) Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680–5689, 2017.
 ProPublica (2018) ProPublica. Compas recidivism risk score data and analysis, Mar 2018. URL https://www.propublica.org/datastore/dataset/compasrecidivismriskscoredataandanalysis.
 Woodworth et al. (2017) Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. Learning nondiscriminatory predictors. In Conference on Learning Theory, pages 1920–1953, 2017.
 Zafar et al. (2015) Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259, 2015.
 Zemel et al. (2013) Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
 Žliobaite et al. (2011) Indre Žliobaite, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 992–1001. IEEE, 2011.
Appendix A Sampling Technique
We present an alternative to the weighting technique. For the sampling technique, we note that the distribution corresponds to the conditional distribution,
where is a random variable sampled from and is a random variable sampled from the distribution . Therefore, in our training procedure for , given a data point , where is sampled according to (i.e., ), we sample a value from the random variable , and train on if and only if . This procedure corresponds to training on data points with sampled according to the true, unbiased label function .
The sampling technique ignores or skips data points when (i.e., when the sample from does not match the observed label). In cases where the cardinality of the labels is large, this technique may ignore a large number of examples, hampering training. For this reason, the weighting technique may be more practical.
Appendix B Algorithms for Other Notions of Fairness
Equal Opportunity: Algorithm 1 can be directly used by replacing the demographic parity constraints with equal opportunity constraints. Recall that in equal opportunity, the goal is for the positive prediction rates on the positive examples of the protected group to match that of the overall. If the positive prediction rate for positive examples is less than that of the overall, then Algorithm 1 will upweight the examples of which are positively labeled. This encourages the classifier to be more accurate on the positively labeled examples of , which in other words means that it will encourage the classifier to increase its positive prediction rate on these examples, thus leading to a classifier satisfying equal opportunity. In this way, the same intuitions supporting the application of Algorithm 1 to demographic parity or disparate impact also support its application to equal opportunity. We note that in practice, we do not have access to the true labels function, so we approximate the constraint violation using the observed labels as .
Equalized Odds: Recall that equalized odds requires that the conditions for equal opportunity (regarding the true positive rate) to be satisfied and in addition, the false positive rates for each protected group match the false positive rate of the overall. Thus, as before, for each true positive rate constraint, we see that if the examples of have a lower true positive rate than the overall, then upweighting positively labeled examples in will encourage the classifier to increase its accuracy on the positively labeled examples of , thus increasing the true positive rate on . Likewise, if the examples of have a higher false positive rate than the overall, then upweighting the negatively labeled examples of will encourage the classifier to be more accurate on the negatively labeled examples of , thus decreasing the false positive rate on . This forms the intuition behind Algorithm 2. We again approximate the constraint violation using the observed labels as for .
More general constraints: It is clear that our strategy can be further extended to any constraint that can be expressed as a function of the true positive rate and false positive rate over any subsets (i.e. protected groups) of the data. Examples that arise in practice include equal accuracy constraints, where the accuracy of certain subsets of the data must be approximately the same in order to not disadvantage certain groups, and high confidence samples, where there are a number of samples which the classifier ought to predict correctly and thus appropriate weighting can enforce that the classifier achieves high accuracy on these examples.
Appendix C Proof of Proposition 1
Proof of Proposition 1.
The constrained optimization problem stated in Assumption 1 is a convex optimization with linear constraints. We may use the Lagrangian method to transform it into the following minmax problem:
where is define as
Note that the KLdivergence may be written as an inner product:
Therefore, we have
In terms of , this is a classic convex optimization problem (Botev and Kroese, 2011). Its optimum is a Boltzmann distribution of the following form:
The desired claim immediately follows. ∎
Appendix D Proof of Theorem 2
Proof of Theorem 2.
The corresponding weight for each sample is if and if , where
. Then, we have by Proposition 1 that
(3) 
Let us denote the weight for example . Suppose that is the optimal learner on the reweighted objective. That is,
where and . Let us partition into a grid of dimensional hypercubes with diameter , and let this collection be . For each , let us denote the center of as . Define
where .
We now show that . We have
where the first inequality holds by smoothness of ; the second inequality holds because the value of for each will be the same assuming that is chosen sufficiently small to not allow examples from different protected attributes to be in the same and then applying Bernstein’s concentration inequality so that this holds with probability at least for some constant ; finally the last inequality holds for sufficiently small. Similarly, we can show that , as desired.
It is clear that .
We now bound the amount can deviate from at on average. Let . Then, we have
because otherwise,
contradicting the fact that minimizes .
We thus have
where the last inequality follows by lower bounding in terms of .
Thus,
Comments
There are no comments yet.