Learning Fair and Interpretable Representations via Linear Orthogonalization

by   Yuzi He, et al.

To reduce human error and prejudice, many high-stakes decisions have been turned over to machine algorithms. However, recent research suggests that this does not remove discrimination, and can perpetuate harmful stereotypes. While algorithms have been developed to improve fairness, they typically face at least one of three shortcomings: they are not interpretable, they lose significant accuracy compared to unbiased equivalents, or they are not transferable across models. To address these issues, we propose a geometric method that removes correlations between data and any number of protected variables. Further, we can control the strength of debiasing through an adjustable parameter to address the trade-off between model accuracy and fairness. The resulting features are interpretable and can be used with many popular models, such as linear regression, random forest and multilayer perceptrons. The resulting predictions are found to be more accurate and fair than several comparable fair AI algorithms across a variety of benchmark datasets. Our work shows that debiasing data is a simple and effective solution toward improving fairness.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning Optimal Fair Classification Trees

The increasing use of machine learning in high-stakes domains – where pe...

Learning Fairness-aware Relational Structures

The development of fair machine learning models that effectively avert b...

In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

In recent years, academics and investigative journalists have criticized...

Fair Forests: Regularized Tree Induction to Minimize Model Bias

The potential lack of fairness in the outputs of machine learning algori...

Learning Controllable Fair Representations

Learning data representations that are transferable and fair with respec...

Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data

In this paper, we advocate for the study of fairness techniques in low d...

Unaware Fairness: Hierarchical Random Forest for Protected Classes

Procedural fairness has been a public concern, which leads to controvers...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning (ML) models sift through mountains of data to make decisions on matters big and small: e.g., who should be shown a product recommendation, hired for a job, or given a home loan. Machine inference can systematize decision processes to take into account orders of magnitude more information, produce accurate decisions, and avoid the common pitfalls of human judgment, such as prejudice and blind spots. Moreover, unlike people, machines will never make poor decisions when tired [Danziger, Levav, and Avnaim-Pesso2011], pressed for time or distracted by other matters [Shah, Mullainathan, and Shafir2012, Mani et al.2013].

Recent research suggests, however, that discrimination remains pervasive, even in ML models [Angwin et al.2016, Chouldechova2017, Dressel and Farid2018, O’neil2016]. For example, a model used to evaluate criminal defendants for recidivism assigned higher risk scores to African Americans than to Caucasians [Angwin et al.2016]. As a result, reformed African American defendants, who would never commit another crime, were deemed by the model to present a higher risk to society—as much as twice as high [Angwin et al.2016, Dressel and Farid2018]—as reformed white defendants, with potentially grave consequences on how they were treated by the justice system.

Emerging field of AI fairness has produced approaches for mitigating harmful model biases [Dwork et al.2012, Chouldechova2017, Chouldechova and Roth2018], such as penalizing unfair inferences for particular models [Dwork et al.2012, Berk et al.2017], or creating representations that do not strongly depend on protected features [Jaiswal et al.2018, Moyer et al.2018, Locatello et al.2019]. These methods, however, lack least one of three critical attributes: interpretability, accuracy, or generalizability. Interpretability is necessary for understanding social factors and individual features contributing to discrimination and bias, as well as improving transparency and accountability of AI systems. In contrast to black box models, fair models need to be able to explain their decisions. In terms of performance, although models must sacrifice accuracy to fairness [Pierson et al.2017], the trade-off need not be as dramatic as what current methods achieve. The issue of generalizability stems from the specialization of current methods to specific ML models. These methods cannot easily generalize to other models. For example, while [Zafar et al.2017, Kamiran, Calders, and Pechenizkiy2010, Berk et al.2017]

all create different methods for fair ML, each method is specialized to regressions, support vector machines (SVM), or random forests. Similarly, while previous methods create fair latent features for neural networks (NN)

[Jaiswal et al.2018, Moyer et al.2018], the methods cannot be easily applied to improve fairness in non-NN models. These fair AI algorithms were not meant to be generalizable, because there do not seem to be adequate meta-algorithms that debias a whole host of ML models. One might naively expect that we can just create a single fair model and apply it to all datasets. The problem is that model performance varies greatly on different datasets. While NNs are critical for, e.g., image recognition [Ciregan, Meier, and Schmidhuber2012], other methods perform better for small data [Olson, Wyner, and Berk2018], especially when the number of dimensions is high and the sample size low [Liu, Wei, and amd Qiang Yang2017]. There is no one-size-fits-all model and there is no one-size-fits-all debiasing method. Is there an easier way to create fairer predictions other than specialized methods for specialized ML models? Chen et al. offer some clues to addressing this fundamental issue in fair AI [Chen, Johansson, and Sontag2018], namely that by addressing data biases, we can potentially improve fair AI across the spectrum of models, and achieve fairness without sacrificing greatly prediction quality.

Following the ideas of Chen et al., we therefore develop a geometric method for debiasing features

. Depending on the hyperparameter we choose, these features are mathematically guarentees to be uncorrelated with specified sensitive, or

protected, features. This method is exceedingly fast and the debiased features are highly correlated with the original features (average Pearson correlations are between 0.993–0.994 across the three datasets studied in this paper). These features are as interpretable as the original features when applied to any model. When applied to linear regression, for example, the coefficients are the same or similar to the coefficients of the original features when controlling for protected variables (see Methods). These debiased features serve as a fair representation of data that can be used with a number of ML models, such as linear regression, random forest, SVMs, and NNs. Due to the small size of the benchmark data, we do not use our features to train NNs in this paper, because NNs could easily overfit the data. While previous methods have created fair representations [Olfat and Aswani2018, Samadi et al.2018, Jaiswal et al.2018, Moyer et al.2018], these methods create representations that are either not interpretable, like PCA components, or the relationship between these fair representations and the original features have not been established. We evaluate the proposed approach on a number of now-benchmark datasets. We show that models using these debiased features are more accurate for almost any level of fairness we desire.

In the rest of the paper, we first review recent advances in fair AI to highlight the novelty of our method. Next, we describe in the Methods section our methodology to improve data fairness, and the definitions of fairness we use in the paper. In Results, we describe how our method improves fairness in both synthetic data and empirical benchmark data. We compare to several competing methods and demonstrate the advantages of our method. Finally, we summarize our results and discuss future work in the Conclusion section.

2 Related Work

Social scientists use linear regression for data analysis due to its simplicity and interpretability. Interpretability comes from regression coefficients, which specify how the outcome, or response, changes when features change by one unit. However, regression creates unfair outcomes, even when protected features are excluded from the model, because other features may be correlated with them.

To make regression models fair, researchers introduced a loss function to penalize regression for unfair outcomes 

[Berk et al.2017]. Similarly, [Zafar et al.2015]

created fair logistic regression by introducing fairness constraints that limit the covariance between protected features and the outcome. An alternate method achieved fairness by constraining false positive or false negative rates 

[Zafar et al.2016]. There are some issues in these works. First, protected features are not included in the logistic model with fairness constraints. While this improves privacy, it forces the parameters of logistic models to take certain combinations which will minimize the correlation with the protected features. This can reduce the accuracy when the constraints are strict. The issue for the second method is mainly numeric. The algorithm requires an optimization of a convex loss function on a non-convex parameter space. While these models are generally interpretable, the approaches do not transfer to other models. Their accuracy also often suffers in comparison to neural methods.

Researchers have explored a variety of methods for learning fair representations of data [Jaiswal et al.2018, Moyer et al.2018, Louizos et al.2015, Xie et al.2017, Zemel et al.2013, Samadi et al.2018, Olfat and Aswani2018]. Some of those works use NNs to embed raw features in a lower-dimensional space, such that the embedding will contain the information about the outcome variable, but at the same time, contain little information about the protected feature. Fair logistic models or fair scoring, on the other hand, can be regarded as a one dimensional embedding of data, which makes sure that the predictions, , are independent of the protected features. They are mainly used with NNs, which while being highly accurate, often lack interpretability. Two methods were instead developed to improve fairness of PCA features [Samadi et al.2018, Olfat and Aswani2018], but, while they can be applied to non-NN ML models, they lack interpretability compared to the original features.

Johndrow2017 (Johndrow2017) proposed an algorithm which removes sensitive information about protected groups based on inverse transform sampling. The algorithm transforms individual features such that the transformed features satisfy the marginal distribution. Although this method can guarantee that predictions are fair in a probabilistic sense, it has a critical disadvantage — as the number of protected features increases, the number of protected groups increases as

. This means that in order to properly estimate conditional and marginal distribution of features, one needs exponentially increasing population size. Our method overcomes these difficulties by using linear algebra as the basis for learning unbiased representations. This allows our algorithm to only take

time to debias data. Moreover, our method is a white box: it is interpretable and can be fully scrutinized, unlike a black box method.

3 Methods

We describe a geometric method for constructing fair interpretable representations. These representations can be used with a variety of ML methods to create fairer and accurate models of data.

3.1 Fair Interpretable Representations

We consider tabular data with entries and features. The features are vectors in the -dimensional space, denoted as where , and one of the columns corresponds to the outcome, or target variable . Among the features, there are also protected features, . As a pre-processing step, all features are centered around the mean: .

We describe a procedure to debias the data so as to create linearly fair features. We aim to construct a representation of a feature , that is uncorrelated with protected columns , but highly correlated to feature . We recall that Pearson correlation between the representation and any feature is defined as

where is the expectation, and and Because all the features are centered (and we also assume that is centered), , we have




Zero correlations between and protected columns requires that lives in the solution space of . Maximizing correlations between and under this constraint is equivalent to projecting into the solution space of .

To calculate , we can first create an orthonormal basis of vectors , which we can label as . We then construct a projector . The representation is given as


Using the Gram–Schmidt process, the orthonormal basis can be constructed in

time and for every fair representation of features, the projection takes time. Given features, the total time of the algorithm is Therefore our method scales linearly with respect to the size of the data and the number of features. In practice, this is exceedingly fast. For example, this algorithm takes only 90 milliseconds to run on the Adult dataset described below, which has 20K rows and over 100 features.

While the previous discussion was on how to create linearly fair features, one can make linearly fair outcome variables, through the same process. In prediction tasks, however, we do not have access to the outcome data. While our method does not guarantee that every model’s estimate of the outcome variable, is fair, we find that it can significantly improve the fairness compared to competing methods. Moreover, in the special case of linear regression, it can be shown that the resulting estimate, , is uncorrelated with the protected variables.

Inevitably, the accuracy of a model using such linearly fair features will drop compared to using the original features, because the solution is more constrained. To address this issue, we introduce a parameter , which indicates the fairness level. We define the parameterized latent variable as


Here, corresponds to , which is strictly orthogonal to the protected features ; while gives .

The protected features can be both real valued and cardinal. The fair representation method can also handle categorical protected features by introducing dummy variables. Specifically, if a variable

has categories , we can can convert them to binary variables where the variable is 1 if the variable is category , and otherwise 0. If all variables are 0, then the category is . As a simple example, if a feature has 3 categories, , , and , then the dummy variables would be and . If , the category is , if , then the category is , and otherwise is . The condition of fairness in this case is interpreted as same mean value of the latent variables in different categorical groups.

3.2 Fair Models

Using the procedure described above, we can construct a fair representation of every feature, and use the fair features to model the outcome variable. Consider a linear regression model that includes all features: protected features and non-protected features features .


After transforming the features to fair features , the fair regression model reduces to:


We can prove that , but the predicted value is uncorrelated with protected features . In general linear regression, such as logistic regression, this proof does not hold, but we numerically find that coefficients are similar.

We should take a step back at this point. The fair latent features are close approximations of the original features, therefore we expect that, and in certain cases can prove, that the regression coefficients of the fair features should be approximately the coefficients of the original features. The fair features can, by this definition, be considered almost as interpretable as the original features.

In addition to regression, fair representations could be used with other ML models, such as AdaBoost [Freund and Schapire1997], NuSVM [Chang and Lin2011], random forest [Breiman2001], and multilayer perceptrons [Rosenblatt1961]. The hyperparameters used in our models are shown in the Appendix.

3.3 Measuring Fairness

While there exists no consensus for measuring fairness, researchers have proposed a variety of metrics, some focusing on representations and some on the predicted outcomes [Verma and Rubin2018, Hutchinson and Mitchell2019]. We will therefore compare our method to competing methods using the following metrics: Pearson correlation, mutual information, discrimination, calibration, balance of classes, and accuracy of the inferred protected features. Due to space limitations, we leave mutual information out of our analysis in this paper, and do not compare calibration and balance of classes to model accuracy. Results in all cases are similar.

3.3.1 Fairness of Outcomes

One can argue that outcomes are fair if they do not depend on the protected features. If this is the case, a malicious adversary won’t be able to guess the protected features from the model’s predictions. One way to quantify the dependence is through Pearson correlation between (real valued or cardinal) predictions and protected features. For models making binary predictions, fairness can be measured using the mutual information between predictions and the protected features, given that protected features are discrete. We find mutual information and Pearson correlations create similar findings, therefore we focus on Pearson correlations in this paper. Previous work [Zemel et al.2013] has also defined a discrimination metric for binary predictions as below. Consider a protected variable , a binary prediction of an outcome . The metric measures the bias of a binary prediction with respect to a single binary protected feature using the difference of positive rates between the two groups.


For real valued predictions (), Kleinberg2016 (Kleinberg2016) suggested a more nuanced way to measure fairness:

  • Calibration within groups

    : Individuals assigned predicted probability

    , should have an approximate positive rate of . This should hold for both protected groups ( and ).

  • Balance for the negative class: The mean of group and group should be the same.

  • Balance for the positive class: The mean of group and group should be the same.

In some cases, calibration error is difficult to calculate, as it depends on how predictions are binned. In these cases, we can measure calibration error using log-likelihood of the labels given the real valued predictions as a proxy. By definition, logistic regression maximizes the (log-)likelihood function, assuming the observations are sampled from independent Bernoulli distributions where

. Better log-likelihood implies that the individuals assigned probabilities are more likely to have a positive rate , which is better calibrated according to Kleinberg2016.

3.3.2 Fairness of Representations

Several past studies examined the fairness of representations, arguing that models using fair representations will also make fair predictions. Learned representations are considered fair if they do not reveal any information about the protected features [Jaiswal et al.2018, Moyer et al.2018, Louizos et al.2015, Xie et al.2017, Verma and Rubin2018]. The studies trained a discriminator to predict protected features from the learned representations—using accuracy as a measure of fairness.

Following this approach, we treat the predicted probabilities as a 1-dimensional representation of data and use the accuracy of the inferred protected features as a measure of fairness. However, this method is not effective in situations where the protected classes are unbalanced. Let us assume the fair representation is and the protected feature is . For simplicity, we only consider the case of a single binary protected feature. The discriminator infers the protected feature in a Bayesian way, namely,


In the case where there is a large difference between and , even if there is useful information in the distribution

, the discriminator will not perform significantly better than the baseline model, the majority class classifier.

4 Results

In this section, we demonstrate how our method can achieve fair classification using synthetic data, and then compare our prediction accuracy and fairness to other fair AI algorithms using benchmark datasets.

4.1 Synthetic Data

We create synthetic biased data using the procedure described in [Zafar et al.2016]. We generate data with one binary protected variable , one binary outcome , and two continuous features, and

, which are bivariate Gaussian distributed within each value of

. In the Fig. 1, we use color to represent protected feature values (red, blue) and outcome using symbol (,

). The first observation is that there is an imbalance in the joint distribution of the protected features and the outcome variable. For blue color markers, there are more blue

s than blue s. We expect that a logistic classifier trained on this data will show similar unbalanced behavior. To demonstrate our method, we choose two different fairness levels, . We first transform the two features into their corresponding fair representations and then we train logistic classifiers using these fair representations. In Fig. 1, we plot the data using the fair representations and we show the classification boundary using a green dashed line. We can observe that for , the blue markers and red markers are mixed (less discrimination and bias), but for (equivalent to raw data), the blue and red markers tend to separate from each other. We can estimate this imbalance by comparing the ratio of blue in individuals predicted as and the ratio of blue in individuals predicted as . The larger the difference, the more the imbalance. Quantitatively, for , there are 62.7% blue in o-predictions and 52.9% in x-predictions. For , those ratios are 76.2% and 36.5%. The accuracy of outcome predictions are 0.811 and 0.870 for the fair and original features, respectively, thus demonstrating that, while increasing fairness does indeed sacrifice in accuracy, the loss can be relatively small. Overall, the results suggest that biased data creates biased models, but our method can make fairer models.

Figure 1: Fair synthetic data. (a) raw data (), (b) plot for fairness level . The two features in the data are and , and the two classes we want to protect are in red and blue. The two outcome classes are represented as two symbols: and .

4.2 Real-World Data

We compare our fair logistic model to state-of-the-art methods on three real-world datasets, which have become benchmarks for fair AI.

German dataset has 61 features about 1,000 individuals, with a binary outcome variable denoting whether an individual has a good credit score or not. The protected feature is gender. (https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data))

COMPAS dataset contains data about 6,172 defendants. The binary outcome variable denotes whether the defendant will recidivate (commit a crime) within two years. The protected feature is race (whether the race is African American or not), and there are nine features in total. (https://github.com/propublica/compas-analysis)

Adult dataset contains data about 45,222 individuals. The outcome variable is binary, denoting whether an individual has more than $50,000. The protected feature is age, and there are 104 features in total. (https://archive.ics.uci.edu/ml/datasets/Adult)

Debiased features had mean correlations of 0.993 (90% quantiles: 0.954–0.999), 0.994 (90% quantiles: 0.980–0.999), and 0.994 (90% quantiles: 0.948–1.000), for the German, COMPAS, and Adult data, respectively. We reserved

of the data in the Adult and COMPAS datasets for testing and used the remaining data to perform 5-fold cross validation. This ensured no leakage of information from the training set to the testing set. The German dataset is much smaller than the rest, so it was randomly divided into five folds of training, validation and testing sets. Each set had 50%, 20% and 30% of all the data. We measured the performance metrics on the test data.

We varied the fairness parameter between 0 and 1 and applied the debiased features to logistic regression, AdaBoost, NuSVM, random forest, and multilayer perceptrons. In practice, one could use a host of commercial ML models and pick the most accurate one given their fairness tolerance.

4.3 Comparison Against State-of-the-Art

We compared our method to several previous fair AI algorithms. For the models proposed by [Zafar et al.2015, Zafar et al.2016], we vary the fairness constraints from perfect fairness to unconstrained. For the “Unified Adversarial Invariance” (UAI) model proposed by [Jaiswal et al.2018], we vary the term in the loss function from 0 (no fairness) to very large value, e.g., for COMPAS dataset, (large value corresponds to perfect fairness). The predictions of the UAI model for the German and Adult dataset are provided by the authors. We are interested in (1) how different models trade off between accuracy and fairness and (2) how different metrics of fairness compare to each other.

4.3.1 Fairness Versus Accuracy

We first investigate the tradeoffs between prediction accuracy (

) and fairness, which we measure three different ways: (1) Pearson correlation between the protected feature and model predictions, (2) discrimination between the binary protected feature and the binarized predictions (predicted probabilities above 1/2 are given a value of 1, and are otherwise 0) and (3) the accuracy of predicting protected features from the predictions (

). To robustly predict the protected features from the model predictions, we used both a NN with three hidden layers, which is used by former works [Jaiswal et al.2018, Moyer et al.2018, Louizos et al.2015, Xie et al.2017, Zemel et al.2013] and a random forest model. We report the better accuracy of those two models. Figure 2,3 and 4 shows the resulting comparisons.

Figure 2: Fairness versus accuracy. Plots show Pearson correlation versus accuracy of predictions (Acc Y) for the German, COMPAS and Adult datasets. For each plot, Zafar2015 stands for “Fair Constraints” [Zafar et al.2015], Zafar2016 stands for “Fairness Beyond Disparate Treatment & Disparate Impact” [Zafar et al.2016] and Jaiswal2018 stands for “Unified Adversarial Invariance” method [Jaiswal et al.2018]. Fair NuSVM, Fair RF, Fair AdaBoost, and Fair MLP results are produced using the fair representations constructed by our proposed method with NuSVM [Chang and Lin2011], random forest [Breiman2001], AdaBoost [Freund and Schapire1997], and multilayer perceptrons [Rosenblatt1961] models, respectively. The results of UAI are not shown for the Adult dataset, since its best accuracy (0.83) lies outside of the boundary of the plot. (Same for Figure 3 and 4.)
Figure 3: Discrimination versus accuracy plots for the three datasets.

The figures show that models using the proposed fair features achieve significantly higher accuracy—for the same degree of fairness—compared to competing methods. In Fig. 4, we find Acc P shows little difference from the baseline majority class classifier for the German and Adult datasets. The reason is explained in Eq.(6). On the other hand, Acc P of COMPAS dataset shows a clear trend because the majority baseline is around 0.51, which is consistent with the Eq.(6). For the Adult dataset, the fair logistic regression cannot achieve perfect fairness but the situation is improved by AdaBoost. We discover, in other words, that there is no single ML model that achieves greater accuracy for a given value of fairness, but our method allows us to choose suitable models to achieve greater accuracy.

4.3.2 Fairness of Representations

Figure 4: Accuracy of inferring the protected variable from the model’s predictions () versus the accuracy of predicting the outcome () for the three datasets.
German Adult
Method Acc Y Acc P Acc Y Acc P
Maj. Class 0.71 0.80 0.75 0.67
Li Li2014 * 0.74 0.80 0.76 0.67
VFAE Louizos2015 * 0.73 0.70 0.81 0.67
Xie Xie2017 * 0.74 0.80 0.84 0.67
Moyer Moyer2018 * 0.74 0.60 0.79 0.69
Jaiswal Jaiswal2018 * 0.78 0.80 0.84 0.67
Fair Logistic 0.74 0.80 0.84 0.67
Fair NuSVM 0.75 0.80 0.85 0.73
Fair AdaBoost 0.75 0.80 0.84 0.67
Fair RF 0.75 0.80 0.85 0.72
Fair MLP 0.75 0.80 0.85 0.67
Table 1: Accuracy of predicted outcomes (Acc Y) and protected features (Acc P) for the German and Adult datasets. The proposed fair methods (bottom four rows) use . Higher Acc Y indicates better predictions while Acc P closer to the majority class baseline indicates fairer predictions. Results marked * were reported by [Jaiswal et al.2019]. Best performance is shown in bold.

We further compared our method with previous works on fair representations. As mentioned before, some previous works use NNs to encode the features into a high dimensional embedding space and then use separately trained discriminators to infer the protected feature and the outcome variable. The accuracy of inferring protected feature and outcome are reported. Ideally, the accuracy for the outcome should be high and the accuracy of inferring the protected features should be close to the majority class baseline. We set the fairness level to , namely perfect fairness when comparing to previous works. We show Acc P and Acc Y for various methods in Table 1 and Fig. 4. Our method applied to a logistic model has similar fairness to the best existing methods but is very fast, easy to understand, and creates more interpretable features.

4.3.3 Balance Versus Calibration

Figure 5: Balance vs. negative log-likelihood (calibration error) for the German, COMPAS and Adult datasets. In the plot, there are two sets of curves for every model, labeled and . stands for the difference of mean (between different protected classes) given to the individuals with negative , and stands for individuals with positive outcomes . (These differences are called balance of negative or positive class by [Kleinberg, Mullainathan, and Raghavan2016].) Fairer models are those in the lower-left hand corner of each plot.

Finally, we use another measure of fairness that captures the degree to which each model makes mistakes. Figure 5 shows delta score (i.e., balance) versus negative log-likelihood (i.e., calibration error). Fairer predictions are located in the lower left hand corner of each figure, meaning that there are fewer differences in outcomes for the different classes. We only compare the logistic model with fair features to the models proposed by Zafar et al. [Zafar et al.2015, Zafar et al.2016], because these models maximize the log-likelihood function (minimize calibration error) when selecting parameters. For all datasets, our method generally achieves greater fairness.

5 Conclusion

We show that our algorithm simultaneously achieves three advances over many previous fair AI algorithms. First, it is interpretable; the features we construct are minimally affected by our fair transform. While this does not mean the models trained on these features are interpretable (they could be a black box), it does mean that any method used to interpret features could easily be used for these fairer features as well. Next, the features preserve model accuracy. Namely, models using these features were more accurate than competing methods when the value of the fairness metric was held fixed. This is in part due to the third principle: that our method can be applied to any number of commercial models; it merely acts as a pre-processing step. Different models have different strengths and weaknesses; while some are more accurate, others are fairer. We can pick and choose particular models that achieve both high fairness and accuracy, whether it is a linear model like logistic regression or a non-linear model like a multilayer perceptron, as shown in Figs. 23, & 4.

Importantly, we only remove linear correlations between each feature and the protected features. While this works very well in practice, and beats state-of-the-art models, the fairness could be improved by removing non-linear correlations. Second, we can extend our method to more easily address categorical protected variables. In the present method, a categorical variable with alphabet size

becomes a set of bivariate variables, It would be ideal, however, if a method reduced the mutual information between the categorical variable directly, rather than first creating variables, and removed correlations.


Authors would like to thank Ayush Jaiswal for providing the code for learning adversarial models and feedback on results. Authors also thank Daniel Moyer and Greg Ver Steeg for insightful discussions about the approach. This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) ) under Contracts No. W911NF-18-C-0011. This research is also based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2017-17071900005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.


Here are the hyperparameters used for modeling the empirical datasets. All models were trained using the sklearn library in Python 3.

Data Model Hyperparameters
German Adaboost Logit model, , penalty, 100 estimators (max)
MLP 64-node, 3-layer network,
Logit , penalty
Random Forest 100 trees, max depth

, Radial basis function kernel

COMPAS Adaboost Logit model, , penalty, 100 estimators (max)
MLP 64-node, 3-layer network,
Logit , penalty
Random Forest 100 trees, max depth
NuSVM , Radial basis function kernel
UAI predictor loss weight,
decoder loss weight,
disentangler loss weight

, epochs

Adult Adaboost Logit model, , penalty, 100 estimators (max)
MLP 64-node, 3-layer network,
Logit , penalty
Random Forest 100 trees, max depth
NuSVM , Radial basis function kernel
Table 2: Hyperparamters for each dataset.


  • [Angwin et al.2016] Angwin, J.; Larson, J.; Mattu, S.; and Kirchner, L. 2016. Machine Bias.
  • [Berk et al.2017] Berk, R.; Heidari, H.; Jabbari, S.; Joseph, M.; Kearns, M.; Morgenstern, J.; Neel, S.; and Roth, A. 2017. A Convex Framework for Fair Regression. 1–15.
  • [Breiman2001] Breiman, L. 2001. Random forests. Machine learning 45(1):5–32.
  • [Chang and Lin2011] Chang, C.-C., and Lin, C.-J. 2011. Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3):27.
  • [Chen, Johansson, and Sontag2018] Chen, I.; Johansson, F. D.; and Sontag, D. 2018. Why Is My Classifier Discriminatory?
  • [Chouldechova and Roth2018] Chouldechova, A., and Roth, A. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810.
  • [Chouldechova2017] Chouldechova, A. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5(2):153–163.
  • [Ciregan, Meier, and Schmidhuber2012] Ciregan, D.; Meier, U.; and Schmidhuber, J. 2012. Multi-column deep neural networks for image classification. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    , 3642–3649.
  • [Danziger, Levav, and Avnaim-Pesso2011] Danziger, S.; Levav, J.; and Avnaim-Pesso, L. 2011. Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences 108(17):6889–6892.
  • [Dressel and Farid2018] Dressel, J., and Farid, H. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances 4(1):eaao5580.
  • [Dwork et al.2012] Dwork, C.; Hardt, M.; Pitassi, T.; Reingold, O.; and Zemel, R. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, 214–226. ACM.
  • [Freund and Schapire1997] Freund, Y., and Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1):119–139.
  • [Hutchinson and Mitchell2019] Hutchinson, B., and Mitchell, M. 2019. 50 years of test (un) fairness: Lessons for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 49–58. ACM.
  • [Jaiswal et al.2018] Jaiswal, A.; Wu, Y.; AbdAlmageed, W.; and Natarajan, P. 2018. Unsupervised Adversarial Invariance. In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; and Garnett, R., eds., Advances in Neural Information Processing Systems 31. Curran Associates, Inc. 5092–5102.
  • [Jaiswal et al.2019] Jaiswal, A.; Wu, Y.; AbdAlmageed, W.; and Natarajan, P. 2019. Unified Adversarial Invariance. 1–16.
  • [Johndrow and Lum2017] Johndrow, J. E., and Lum, K. 2017. An algorithm for removing sensitive information: application to race-independent recidivism prediction. 1–25.
  • [Kamiran, Calders, and Pechenizkiy2010] Kamiran, F.; Calders, T.; and Pechenizkiy, M. 2010.

    Discrimination Aware Decision Tree Learning.

    In 2010 IEEE International Conference on Data Mining, 869–874.
  • [Kleinberg, Mullainathan, and Raghavan2016] Kleinberg, J.; Mullainathan, S.; and Raghavan, M. 2016. Inherent Trade-Offs in the Fair Determination of Risk Scores. 1–23.
  • [Li, Swersky, and Zemel2014] Li, Y.; Swersky, K.; and Zemel, R. 2014. Learning unbiased features. 1(2):1–8.
  • [Liu, Wei, and amd Qiang Yang2017] Liu, B.; Wei, Y.; and amd Qiang Yang, Y. Z. 2017. Deep neural networks for high dimension, low sample size data. In

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

    , 2287–2293.
  • [Locatello et al.2019] Locatello, F.; Abbati, G.; Rainforth, T.; Bauer, S.; Schölkopf, B.; and Bachem, O. 2019. On the fairness of disentangled representations. arXiv preprint arXiv:1905.13662.
  • [Louizos et al.2015] Louizos, C.; Swersky, K.; Li, Y.; Welling, M.; and Zemel, R. 2015.

    The Variational Fair Autoencoder.

  • [Mani et al.2013] Mani, A.; Mullainathan, S.; Shafir, E.; and Zhao, J. 2013. Poverty impedes cognitive function. science 341(6149):976–980.
  • [Moyer et al.2018] Moyer, D.; Gao, S.; Brekelmans, R.; Steeg, G. V.; and Galstyan, A. 2018. Invariant Representations without Adversarial Training. (Nips).
  • [Olfat and Aswani2018] Olfat, M., and Aswani, A. 2018.

    Convex Formulations for Fair Principal Component Analysis.

  • [Olson, Wyner, and Berk2018] Olson, M.; Wyner, A. J.; and Berk, R. 2018. Modern neural networks generalize on small data sets. In NIPS’18 Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3623–3632.
  • [O’neil2016] O’neil, C. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
  • [Pierson et al.2017] Pierson, E.; Simoiu, C.; Overgoor, J.; Corbett-Davies, S.; Ramachandran, V.; Phillips, C.; and Goel, S. 2017. A large-scale analysis of racial disparities in police stops across the United States. preprint arXiv:1706.05678.
  • [Rosenblatt1961] Rosenblatt, F. 1961. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington DC: Spartan Books.
  • [Samadi et al.2018] Samadi, S.; Tantipongpipat, U.; Morgenstern, J.; Singh, M.; and Vempala, S. 2018. The Price of Fair PCA: One Extra Dimension. (Nips).
  • [Shah, Mullainathan, and Shafir2012] Shah, A. K.; Mullainathan, S.; and Shafir, E. 2012. Some consequences of having too little. Science 338(6107):682–685.
  • [Verma and Rubin2018] Verma, S., and Rubin, J. 2018. Fairness definitions explained. In 2018 IEEE/ACM International Workshop on Software Fairness (FairWare), 1–7. IEEE.
  • [Xie et al.2017] Xie, Q.; Dai, Z.; Du, Y.; Hovy, E.; and Neubig, G. 2017. Controllable invariance through adversarial feature learning. Advances in Neural Information Processing Systems 2017-December(Mmd):586–597.
  • [Zafar et al.2015] Zafar, M. B.; Valera, I.; Rodriguez, M. G.; and Gummadi, K. P. 2015. Fairness Constraints: Mechanisms for Fair Classification. 54.
  • [Zafar et al.2016] Zafar, M. B.; Valera, I.; Rodriguez, M. G.; and Gummadi, K. P. 2016. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment.
  • [Zafar et al.2017] Zafar, M. B.; Valera, I.; Rodriguez, M. G.; Gummadi, K. P.; and Weller, A. 2017. From Parity to Preference-based Notions of Fairness in Classification. In NIPS.
  • [Zemel et al.2013] Zemel, R.; Wu, Y.; Swersky, K.; Pitassi, T.; and Dwork, C. 2013. Learning Fair Representations. In Dasgupta, S., and McAllester, D., eds., Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, 325–333. Atlanta, Georgia, USA: PMLR.