Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data

08/24/2019 ∙ by Dylan Slack, et al. ∙ University of California, Irvine Haverford College 0

In this paper, we advocate for the study of fairness techniques in low data situations. We propose two algorithms Fairness Warnings and Fair-MAML. The first is a model-agnostic algorithm that provides interpretable boundary conditions for when a fairly trained model may not behave fairly on similar but slightly different tasks within a given domain. The second is a fair meta-learning approach to train models that can be trained through gradient descent with the objective of "learning how to learn fairly". This method encodes more general notions of fairness and accuracy into the model so that it can learn new tasks within a domain both quickly and fairly from only a few training points. We demonstrate experimentally the individual utility of each model using relevant baselines for comparison and provide the first experiment to our knowledge of K-shot fairness, i.e. training a fair model on a new task with only K data points. Then, we illustrate the usefulness of both algorithms as a combined method for training models from a few data points on new tasks while using Fairness Warnings as interpretable boundary conditions under which the newly trained model may not be fair.



There are no comments yet.


page 7

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

As machine learning tools become more responsible for decision making in sensitive domains such as credit, employment, and criminal justice, developing methods that are both fair and accurate become critical to the success of such tools. Correspondingly, there has been an increasing amount of academic interest in the field of fair machine learning (for surveys, see (Chouldechova and Roth, 2018; Romei and Ruggieri, 2014; Zliobaite, 2015; Barocas et al., 2018)

). Research on fairness is often concerned with identifying a notion of fairness, developing an approach that mitigates the notion of fairness, and applying the approach to a variety of data sets in a supervised learning setting (see, e.g.,

(Feldman et al., 2015; Hardt et al., 2016; Zafar et al., 2017c, a)).

However, we ask where this leaves fairness-concerned practitioners who are interested in using fair tools for their particular applications but have access to minimal or no training data. In particular, we introduce the following questions:

  • When can a practitioner rule out the use of a fair tool trained in a similar but slightly different context (e.g. whether a policy maker in Houston can rule out the use of a fair recidivism tool trained in Philadelphia)?

  • How can a practitioner who has access to only a few labeled training points for a particular task still train a fair model?

The most related work to the proposed questions is fairness applied to transfer learning and the covariate shift problem in machine learning. Covariate shift deals with situations where the distribution of data in application differs from the distribution of data in training. Covariate shift is a well studied field, and there are numerous methods that attempt to train supervised learning classifiers that are robust to test distribution shifts with respect to accuracy

(Bickel et al., 2009; Subbaswamy et al., 2018; Lipton et al., 2018). Related methods have been developed to address fairness in the covariate shift setting. Kallus et. al. address the problem of systematic bias in data collection and use covariate shift methods to better compute fairness metrics under such conditions (Kallus and Zhou, 2018). Coston et. al. consider the situation where there are sensitive labels available in only the source or target domain and propose covariate shift methods to solve such problems (Coston et al., 2019).

Additional work focuses on transferring fair machine learning models across domains. Madras et. al. propose a solution called LAFTR that uses an adversarial approach to create an encoder that can be used to generate fair representations of data sets and demonstrate the utility of the encoder for fair transfer learning (Madras et al., 2018)

. Similarly, Schumman et. al. provide theoretical guarantees surrounding transfer fairness related to equalized odds and opportunity and suggest another adversarial approach aimed at transferring into new domains with different sensitive attributes

(Schumann et al., 2019). Lan and Huan observe that the predictive accuracy of transfer learning across domains can be improved at the cost of fairness (Lan and Huan, 2017). Related to fair transfer learning, Dwork et. al. use a decoupled classifier technique to train a selection of classifiers fairly for each sensitive group in a data set (Dwork et al., 2018).

We argue that our proposed questions are different than the existing work in the following ways. While methods exist that address fairness and covariate shift, such methods do not address the problem of communicating to practitioners and policy makers what domain specific factors might cause a fairly trained model to fail to be fair in practice. Because data sets containing sensitive information can be difficult to obtain, it could be challenging to train a fair machine learning tool using data from only one’s specific context. Practitioners might have to rely on data from other geographic locations. Because of demographic or political differences location-to-location, the distribution of data in terms of feature values, sensitive attributes, and labels could be different from one context to another. Thus, determining what changes to the distribution of data might cause a fairly-trained machine learning model to behave unfairly could be useful to practitioners interested in transferring fair models to their particular applications.

Additionally, we note the problem of training fair machine learning models with very little task specific training data is relatively unstudied. Practitioners might have access to minimal training data in one task and sufficient data from other related tasks. This data might be minimal or skewed in terms of which sensitive attribute or label the data belongs to (e.g. only examples of African-Americans who have been denied a loan) because of data collection issues associated with sensitive data sets like those discussed in Kallus et. al

(Kallus and Zhou, 2018). It could be useful to devise models that are able to achieve satisfactory levels of fairness and accuracy on new tasks with minimal data while being robust enough to handle unfavorable distributions of training data across both labels and sensitive attributes.

In this paper, we propose two different methods to address the proposed problems. First, we discuss the situation where a practitioner has no training data and must decide whether to use a fair machine learning tool trained in another similar but slightly different context. We introduce Fairness Warnings — a model agnostic approach that provides interpretable boundary conditions on fairness for when not to apply a fair model in a different but related context because the model may behave unfairly. Fairness Warnings provide an interpretable model that indicates what distribution shifts to a data set’s feature values, labels, and sensitive attributes may cause a fairly trained classifier to act unfairly in terms of a user specified notion of group fairness. While the covariate shift problem setting allows for arbitrary changes to the testing distribution, we only consider mean shifts in this paper. We discuss the limitations imposed by this problem restriction in section 3.1.2. To provide intuition, if Fairness Warnings were trained on a recidivism classifier with respect to the rule of demographic parity (Feldman et al., 2015; Barocas and Selbst, 2016), the model would provide conditions such as what mean shifts to the features age and priors count would cause the model to score demographic parity lower than . A practitioner could use this information in combination with their own knowledge about their application to inform their decision surrounding whether not to use a fair machine learning tool.

Second, we consider the related situation where a practitioner has access to training data across related tasks in a domain but minimal training data for their desired task. We introduce a meta-learning approach, Fair-MAML, to address the problem. We empirically demonstrate the ability of Fair-MAML to quickly train models that are both accurate and fair with respect to different notions of fairness. Fair-MAML is based on a meta-learning algorithm called Model Agnostic Meta Learning or MAML (Finn et al., 2017)

that has shown success in reinforcement learning and image recognition. Fair-MAML is model agnostic in the sense that it is compatible with any model trained through gradient descent. It encourages the learning of more general notions of fairness and accuracy that allow it to achieve strong results on new tasks with only minimal data available. Finally, we connect Fairness-Warnings and Fair-MAML by applying Fairness Warnings as boundary conditions on the fine-tuned fair meta-model.

2. Background

2.1. Fairness

We consider a binary fair classification setting with features , labels , and sensitive attributes . Our goal is to train a model that outputs predictions such that the predictions are both accurate with respect to and fair with respect to the groups defined by . We consider the “positive” outcome (receiving a loan) and the “negative” outcome (being denied a loan). Within the sensitive attribute, one label is protected and the other unprotected. The protected group might be from a historically disadvantaged group such as women or African-Americans. We will use to denote the protected group and to indicate the unprotected group.

There are three often used ways to define group fairness in this setting. The first, demographic parity (or statistical parity (Dwork et al., 2012)), can be formalized as:


This is also known as a lack of disparate impact (Feldman et al., 2015; Barocas and Selbst, 2016) or discrimination (Calders and Verwer, 2010). A value closer to indicates fairness.

The second group fairness definition, equalized odds (Hardt et al., 2016), requires that have equal true positive rates and false positive rates between groups, where values closer to indicate fairness:


This is also known as error rate balance (Chouldechova, 2017) or disparate mistreatment (Zafar et al., 2017a). Equal opportunity (or equal true positive rates) introduces relaxed constraints on 2 and requires the equivalence to hold only on the positive outcome in . As compared to equalized odds, equal opportunity often allows for increased accuracy (Hardt et al., 2016).

2.2. Meta-Learning

Meta-learning is concerned with training models such that they can be trained on new tasks using only minimal data and few training iterations within a domain. Meta-learning can be phrased as “learning how to learn” because such methods are trained on a range of tasks with the goal of being able to adapt to new tasks more quickly (Vanschoren, 2019). Metaphorically, this can be likened to finding a base camp (meta-model) from which you can quickly ascend to multiple nearby peaks (optimized per-task models).

In the supervised learning setting, each task = where is a data set containing pairs and

is a loss function. We consider a distribution over tasks

which we train the meta-model to adapt to. Supposing the meta-model is a parameterized function with parameters , its optimal parameters are:


This states that the optimal parameters of the model are those that minimize the loss with respect to both and . Intuitively, the model parameters should be such that they are nearly optimal for a range of tasks. Ideally, this will mean that optimizing for any new task is quick and requires minimal data.

In the meta-learning scenario used in this paper, we train to learn a new task using examples drawn from . Additionally, we assume can be optimized through gradient descent. During the meta-training procedure, examples are drawn from . The model is trained with respect to and and the test performance is evaluated with new examples. The use of only training examples for learning a new task is often referred to as -shot learning and such methods have generally been applied to image recognition and reinforcement learning (Vinyals et al., 2016). Based on the test performance, is improved. The meta-model is evaluated at the end of meta-training through a set of tasks that are not included in the meta-training procedure.

3. Methods

3.1. Fairness Warnings

3.1.1. Framework

Similar to the formalization of LIME in Ribeiro et. al. (Ribeiro et al., 2016), we define fairness warnings as an interpretable model where

is a class of interpretable models such as decision trees or logistic regression

(Slack et al., 2019). Further, is a function where is a set of distribution shifts applied to the features, labels, and sensitive values of some test data set under which a fair model is evaluated. We assume is fair with respect to some notion of group fairness such as equation 1, and the domain of represents whether the potential shift may result in fair classifications according to that notion of group fairness. Additionally, we assume that group fairness can be evaluated as fair or unfair according to some binary notion of fairness success such as the rule of demographic parity (EEOC, 1979; Feldman et al., 2015; Barocas and Selbst, 2016). We assume access to a function that maps between a data set and whether acts fairly on that data set according to the binary notion of group fairness.

3.1.2. Problem Restrictions

In typical covariate shift settings, the testing distribution can be changed in any number of ways — including being drawn from an entirely different distribution altogether. In this application, we only consider shifts to the mean of the distribution of data that is available for training. Under this assumption there could be more complex changes to the distribution that affect the mean but are not captured by this summary statistic and that may affect fairness. Because we only consider a subset of the possible changes to the testing distribution, Fairness Warnings only indicate what mean shifts may lead a classifier to not be fair and do not strongly indicate fairness if no warning is issued. Additionally, it could be the case that Fairness Warnings predict unfairness for certain mean shifts but due to other changes to the testing distribution the classifier actually behaves fairly. Because of these challenges, Fairness Warnings are just that—warnings that there is some evidence that suggests the model may behave unfairly with respect to a notion of group fairness.

3.1.3. Slim

In practice, we use Supersparse Linear Integer Models or SLIM as the interpretable model (Ustun and Rudin, 2015)

. SLIM creates a linear perceptron that reduces the magnitudes of the coefficients, removes unnecessary coefficients, and forces the coefficients to be integers. SLIM is a highly interpretable method that is well suited to trading off between model complexity in presentation and accuracy. It has hyperparameters

and . controls the marginal accuracy a coefficient must add to stay in the model while does the same except for the magnitude of the coefficients.

3.1.4. Fairness Warnings Algorithm

In order to train , we generate some user specified number of perturbed versions of

using mean shifts. We generate shifts for numerical features by randomly sampling from a Gaussian distribution with the standard deviation of the feature and mean zero. The number sampled is the mean shift across the feature. To perform the shift, we simply add the number to all the values in the feature. We assume categorical features are one-hot encoded and thus only have two binary categorical features in

. We shift each categorical feature by assuming each feature is drawn from a binomial distribution and use the percentage of features labeled


. We shift the feature vector by drawing a new

from a Gaussian distribution and randomly sample a new vector according to . If is less than or greater than , we adjust to or respectively. Doing this a user specified number of times, we create a set of shifted variations of the original .

For each shifted data set, we generate a fairness label using the binary notion of group fairness . We create a data set of mean shifted data sets and their group fairness behavior with respect to , . Finally, we train on using as the features and as the labels. Intuitively, we train so that it learns to predict what mean shifts may result in unfairness. Assuming is some function that computes the mean shifting scheme above, the algorithm for generating fairness warnings is given as Algorithm 1.

0:  : data set
0:  : fairness notion
0:  : interpretable model
0:  : number of shifts to perform
  for  do
  end for
   Train with using as features and as labels return
Algorithm 1 Fairness Warnings

3.2. Fair Meta-Learning

0:  : distribution over tasks
0:  , : step size hyperparameters
  randomly initialize
  while not done do
     Sample batch of tasks
     for all  do
        Sample datapoints } from
        Evaluate using and
        Compute updated parameters:
        Sample new datapoints from to be used in the meta-update
     end for
     Update using each
  end while
Algorithm 2 Fair-MAML

3.2.1. -shot Fairness

In order to address the problem of learning fairly from minimal data on a new task, we introduce the notion of -shot fairness. Given training examples, -shot fairness aims to (1) quickly train a model that is both fair and accurate on a given task. Additionally, because the relationship between fairness and accuracy is often understood as a trade-off (Friedler et al., 2019), an additional aim is to (2) allow tuning of such a model so that it achieves different balances between accuracy and fairness using just training points.

The language used in this paper surrounding -shot learning differs slightly from the language used in typical -shot learning scenarios such as image recognition. In -shot image recognition, the goal is to learn how to distinguish between different image labels using only training examples of each type. The training set size is then examples. Because we assume all the tasks to be binary labeled, all of our tasks are -way. In referencing -shot fairness, we will mean that we are using training examples total—irrespective of class label, with the assumption that all tasks are -way.

3.2.2. Fair-MAML Framework

We expand the meta learning framework from section 2.2 such that each task includes a fairness regularization term and fairness hyperparameter . Additionally, we require that have a protected feature such that . The goal of is to minimize some notion of group fairness and dictates the trade off between and . A task is defined as . We adjust equation 3 such that the optimal parameters are now:


In order to train a fair meta-learning model, we adapt Model-Agnostic Meta-Learning or MAML to our fair meta-learning framework and introduce Fair-MAML (Finn et al., 2017). MAML is trained by optimizing performance of across a variety of tasks after one gradient step. MAML is particularly well suited to easy fairness adaption because it works with any model that can be trained with gradient descent. The core assumption of MAML is that some internal representations are better suited to transfer learning. The loss function used by MAML is effectively the loss across a batch of task losses. Thus, the MAML learning configuration encourages learning representations that encode more general features than a traditional learning approach. The MAML algorithm works by first sampling a batch of tasks, computing the updated parameters after one gradient step of training on data points sampled from each task, and finally updating based on the performance of on a new sample of points.

We modify MAML to Fair-MAML by including a fairness regularization term in the task losses. The algorithm for Fair-MAML is given in algorithm 2. By including a regularization term, we hope to encourage MAML to learn generalizable internal representations that strike a desirable balance between accuracy and fairness.

3.3. Fairness Regularizers

A variety of fairness regularizers have been proposed to handle various definitions of group fairness (Kamishima et al., 2012; Berk et al., 2017; Huang and Vishnoi, 2019)

. Because MAML has shown success with the use of deep neural networks

(Finn et al., 2017), we require regularization terms compatible with neural networks. Methods that require the model to be linear are clearly not applicable. In addition, Fair-MAML requires that second derivatives be computed through a Hessian-vector product in order to calculate the meta-loss function which can be computationally intensive and time-consuming. Thus, it is critical that our fairness regularization term be quick to compute in order to allow for reasonable Fair-MAML training times.

We propose two simple regularization terms aimed at achieving demographic parity and equal opportunity that are easy to implement and extremely quick to compute. Let denote the protected instances in and . The demographic parity regularizer is:


This regularizer incurs a penalty if the probability that the protected group receives positive outcomes is low. Our value assumption here is that we want to adjust the likelihood of the protected class receiving a positive outcome upwards. Namely, we do not reduce the rate at which the unprotected class receives positive outcomes and adjust upwards the rate at which the protected class receives positive outcomes.

Additionally, we consider a regularizer aimed at improving equal opportunity. Let denote the instances within that are both protected and have the positive outcome in .


We have a similar value assumption using this regularizer as the one for demographic parity. Namely, we adjust the true positive rate of the protected class upwards and do not decrease the true positive rate of the unprotected class.

A nice advantage of these regularizers are that they are easy to implement using common deep learning packages such as PyTorch or TensorFlow

(Paszke et al., 2017; Abadi et al., 2015). Supposing that is a vector of probabilities that outputs a positive value on , is a vector containing the labels, and is a vector containing the sensitive attribute, the demographic parity regularizer and equal opportunity regularizer can be computed as below, where denotes the Hadamard Product:


This formulation is possible because of the proposed binary labels of and from subsection 2.1 where is a binary vector with indicating a protected value and a binary vector with denoting a positive outcome.

4. Experiments

We first demonstrate the individual utility of both fairness warnings and Fair-MAML experimentally. We then show their usefulness as a combined method.

4.1. Fairness Warnings

4.1.1. COMPAS Recidivism Experiment Setup

We initially consider applying Fairness Warnings to the COMPAS recidivism data set. The COMPAS recidivism data set consists of data from over

criminal defendants from Broward County, Florida. It includes attributes such as the sex, age, race, and priors for the defendants in addition to a categorical variable indicating perceived recidivism risk. We pre-process the data set as described in Angwin et. al.

(Angwin et al., 2016). We create a binary sensitive column for whether the defendant is African-American. We predict the ProPublica collected label of whether the defendant was rearrested within two years.

We trained a neural network as the model, , to perform Fairness Warnings. We trained two models—one regularized for demographic parity and the other equal opportunity using the regularization terms from equations 5 and 6 respectively. The demographic parity regularized model scored accuracy and demographic parity on a test set. The equal opportunity regularized model scored accuracy and equal opportunity using the same test set. For the demographic parity fairness warnings, we set the fairness warnings demographic parity threshold at . Meaning, if the classifier scored demographic parity above , it was deemed fair. In the equal opportunity setting, we set the threshold to . We generated perturbed data sets, of which were classified unfairly according to demographic parity. We set to and to . We found that was able to classify whether the shifts applied to the perturbed data sets would result in unfair group fairness behavior with accuracy on a test set. Using the same perturbed set, the equal opportunity regularized network was found to be unfair in of the perturbed examples. Using the same hyperparameters as before, was able to classify whether the shifts would result in unfairness with respect to equal opportunity with accuracy. The Fairness Warnings for the COMPAS data set is given in figure 1.

4.1.2. COMPAS Recidivism Experiment Analysis

The COMPAS Fairness Warnings both rely on priors_count and age to determine what mean shifts to the data set may result in unfairness. In the demographic parity warning for instance, if the mean group age applied to were to increase by years and mean priors were to remain unchanged, the fairness warning would predict unfairness because the score total would be . However, in the equal opportunity case, the same shift would not yield unfairness because . A case that would result in unfairness in the equal opportunity setting would be a decrease in mean priors count by one charge and for age to remain level, i.e. .

Overall, the SLIM implementation of fairness warnings showed good ability to classify whether certain mean shifts applied to the feature values of the COMPAS data set would result in unfairness. Because SLIM is tunable with respect to the importance threshold of features shown in the presentation of the model, the classifier only outputs of a possible feature values in both warnings. The presentation is simple. A practitioner would only have to perform a few arithmetic operations in order to compute the fairness warning outcome.

Additionally, we were able to train a random forest classifiers using

estimators from the Scikit-learn implementation which scored and accuracy on the demographic parity and equal opportunity fairness warnings tasks respectively. This suggests that more robust models could serve as much more accurate fairness warnings than SLIM. Presenting a random forest of such size in a digestible way to a user would be difficult. However, the success of the random forest to perform this task indicates that improved interpretable methods that achieve equal levels of interpretability to SLIM but higher levels of accuracy on the fairness warnings task could serve as more desirable fairness warnings.

Feature Original Mean Score (+/- per unit increase/decrease) Total
priors_count 3.2 priors 20 points / prior +……..
age 34.5 years -2 points / year +……..
(Warning accuracy: 88%)
Feature Original Mean Score (+/- per unit increase/decrease) Total
priors_count 3.2 priors 24 points / prior +……..
age 34.5 years -2 points / year +……..
(Warning accuracy: 86%)
Figure 1. The Fairness Warnings for the COMPAS Recidivism data set for both demographic parity and equal opportunity. The original model is a neural network regularized for the respective notion of fairness. This fairness warning is meant to be read as the expected mean shift away from the original mean of the features presented in a practitioner’s application. For instance, if age were to decrease year and age were to decrease years, the score would be points. points point, so the warning would predict unfairness. Critically, the fairness warning only makes a claim surrounding unfairness. If the model predicts a score , the model does not certify fair behavior.

4.2. Fair-MAML

4.2.1. Synthetic Experiment Setup

We illustrate the usefulness of Fair-MAML as opposed to a regularized pre-trained model in fair few-shot classification through a synthetic example based on Zafar et. al (Zafar et al., 2017b). We generate two Gaussian distributions using the means and covariances from Zafar et. al. The first distribution (1) is set to and the second (2) is set to . During training, we simulate a variety of tasks by dividing the class labels along a line with y-intercept of and a slope randomly selected on the range . All points above the line in terms of their

-coordinate receive a positive outcome while those below are negative. Using the formulation from Zafar et. al., we create a sensitive feature by drawing from a Bernoulli distribution where the probability of the example being in the protected class is:

where . Here, controls the correlation between the sensitive attribute and class labels. The lower , the more correlation and unfairness. We randomly select from the range to simulate a variety in fairness between tasks.

In order to assess the fine-tuning capacity of Fair-MAML and the pre-trained neural network, we introduced a more difficult fine-tuning task. During training, the two classes were separated clearly by a line. For fine-tuning, we set each of the binary class labels to a distribution. The positive class was set to distribution (1) and the negative class was set to distribution (2). In this scenario, a straight line cannot clearly divide the two classes. We assigned sensitive attributes using the same strategy as above and used a of . Additionally, we only gave positive-outcome examples from the protected class. We hoped to simulate a situation where a fair classifier is needed on a new task, but there are only a few protected examples in the positive outcome to learn from—simulating the situation where the distribution of fine-tuning task data is biased. An example of such a scenario could be if a practitioner needed to train a new loan tool and had access to only a few examples of African-Americans who received loans.

We randomly generated synthetic tasks that we cached before training. We sampled examples from each task during meta-training, used a meta-batch size of

for Fair-MAML, and performed a single epoch of optimization within the internal MAML loop. We trained Fair-MAML for

meta-iterations. For the pre-trained neural network, we performed a single epoch of optimization for each task. We trained over batches of tasks per batch to match the training set size used by Fair-MAML.

The loss used is the cross-entropy loss between the prediction and the true value using the demographic parity regularizer from equation 5. We use a neural network with two hidden layers consisting of

nodes and the ReLU activation function. We used the softmax activation function on the last layer. When training with Fair-MAML, we used

examples and performed one gradient step update. We set the step size to , used the Adam optimizer to update the meta-loss with learning rate set to . We pre-trained a baseline neural network on the same architecture as Fair-MAML. To one-shot update the pre-trained neural network we experimented with step sizes of and ultimately found that yielded the best trade offs between accuracy and fairness. Additionally, we tested values during training and fine-tuning of . We present an example task in figure 2 using fine-tuning points from the positive outcome and protected class. When , Fair-MAML does not incur any fairness regularization, so the model is just MAML. We give comprehensive results over a variety of tasks in the appendix in figure 5.

4.2.2. Synthetic Experiment Analysis

In the new task, there is an unseen configuration of positively labeled points. It was not possible for positively labeled points to fall below during training. Fair-MAML is able to perform well with respect to both fairness and accuracy on the fine-tuning task when only biased fine-tuning data is available. The pre-trained neural network fails at performing the new task. This example illustrates that Fair-MAML has learned a more useful internal representation for both fairness and accuracy than the pre-trained neural network. Examining the extended results over a variety of randomly selected fine-tuning points in figure 5, Fair-MAML is able to consistently yield both fair and accurate results while the pre-trained neural network is somewhat unstable.

Figure 2. An example decision boundary from the pre-trained neural network, MAML, and Fair-MAML on the synthetic example (note: Fair-MAML is MAML with ). Points that are colored the same as the side of the boundary are correct. Only points in the positive outcome and protected class are given for the fine-tuning task. Fair-MAML is able to handle such an imbalance of training points on a previously unseen task while the pre-trained neural network fails—illustrating that Fair-MAML has learned a more useful internal representation for both fairness and accuracy.

4.2.3. Communities and Crime Experiment

Next we consider an example using the Communities and Crime data set (Lichman, 2013). The Communities and Crime data set includes information relevant to crime (e.g., police per population, income) as well as demographic information (such as race and sex) in different communities across the United States. The goal is to predict the violent crime rate in the community. We convert this data set to a few-shot fairness setting by using each state as a different task. We believe this problem setting is justified because state by state differences ranging from firearm control to weather patterns could affect the generalization ability of a model trained on a selection of states (Hamill et al., 2019; Tiihonen et al., 2017). Because the violent crime rate is a continuous value, we convert it into a binary label based on whether the community is in the top in terms of violent crime rate within a state. Additionally, we add a binary sensitive column that receives a protected label if African-Americans are the highest or second highest population in a community in terms of percentage racial makeup.

The Communities and Crime data set has data from states ranging in number of communities from to communities per state. We only used states with or more communities leaving states. We held out randomly selected states for testing and trained using states. We set and cached meta-batches of size states for training. For testing, we randomly selected communities from the hold out task that we used for fine-tuning and evaluated on whatever number of communities were left over. The number of evaluation communities is guaranteed to be at least because we only included states with or more communities.

We trained two Fair-MAML models—one with the demographic parity regularizer from equation 1 and another with the equal opportunity regularizer from equation 6. For both models, we used a neural network with two hidden layers of nodes. We trained the model with one gradient step using a step size of and a meta-learning rate of using the Adam optimizer. We trained the model for meta-iterations.

In order to assess Fair-MAML, we trained a neural network regularized for fairness using the same architecture and training data. We fine-tuned the neural network for each of the assessment tasks. We used a learning rate of for training and assessed learning rates of for fine-tuning. We found the fine-tuning rate of to perform the best trade offs between accuracy and fairness and present results using this learning rate. We varied over incremented by for the demographic parity regularizer. We found higher ’s to work better for the equal opportunity regularizer and varied from incremented by .

Additionally, we trained two LAFTR models on the transfer tasks as comparisons for demographic parity and equalized opportunity. LAFTR is not intended to be compatible with our proposed -shot fairness experiments because training on fine-tuning tasks with a minimal number of epochs and training points is not expected. However, we find that it is the most relevant fair transfer learning method to use as a baseline. We used the same transfer methodology and hyperparameters as described in Madras et. al. (Madras et al., 2018) and used a neural network with a hidden layer of nodes as the encoder. We used another neural network with a hidden layer of nodes as the MLP to be trained on the fairly encoded representation. We used the demographic parity and equal opportunity adversarial objectives for the first and second LAFTR model respectively. We trained each encoder for epochs and swept over a range of : . We trained with all the data not held out as one of the testing tasks. When training a MLP from the encoder on each of the transfer tasks, we found that LAFTR struggled to produce useful results with only training points from the new task over any number of training epochs. We found that we were able to get reasonable results from LAFTR using fine-tuning points and epochs of optimization—using a minimal number of epochs was unsuccessful. It makes sense that a minimal number of training epochs for the new task is unsuccessful because the MLP trained on the fairly encoded data is trained from scratch. The results are presented in figure 3. Though we do not include the results in presentation, we were able to generate similar results with LAFTR to Fair-MAML using training points from the new task after epochs of optimization.

We observe that Fair-MAML achieves the best trade off between fairness and accuracy both in terms of demographic parity and equal opportunity. In our proposed problem setting, LAFTR was not successful at learning with minimal data and a small number of fine-tuning epochs for the new task. The pre-trained neural network shows some ability to learn the new task using little data and fine-tuning epochs. At low ’s, Fair-MAML is able to achieve higher accuracy than the pre-trained neural network and LAFTR. Crucially, Fair-MAML is able to learn more accurate representations that are also fairer for a range of ’s than both of the baselines. In order to generalize to new states, only communities are needed in order to achieve strong predictive accuracy and fairness using Fair-MAML.

Figure 3. The accuracy/fairness trade off for the communities and crimes example sweeping over a range of ’s. The data presented is the mean across three runs on each using randomly selected hold out tasks. The fairness numbers presented are the ratio between the protected and unprotected groups. Higher accuracy and fairness values closer to indicate more successful outcomes. The pre-trained neural network and Fair-MAML received fine-tuning points and were optimized for epoch. We did not find useful results using LAFTR with only fine-tuning points or with a minimal number of fine-tuning epochs, so the LAFTR example given here is with fine-tuning points and epochs of optimization. Fair-MAML is able achieve better levels of accuracy and fairness than both the pre-trained network and LAFTR on the transfer tasks using minimal fine-tuning data.

4.3. Fair-MAML with Fairness Warnings

4.3.1. Motivation

We next consider Fairness Warnings applied to Fair-MAML. We argue that Fairness Warnings can serve as a complementary tool to Fair-MAML. Because we expect Fair-MAML to be used in situations with minimal data available, it is possible that testing data given to a fine-tuned Fair-MAML model is unrepresentative of the true distribution of data for a particular task. While in section 4.2.1, we empirically demonstrate that Fair-MAML can still achieve good results when training data is available from one value in a sensitive attribute or label, it still may be useful for practitioners to have indication surrounding situations in which their model may fail to be fair in testing.

4.3.2. Communities and Crime Fairness Warning/Fair-MAML Experiment

We apply fairness warnings to Fair-MAML on the communities and crimes experimental setup from section 4.2.3 using demographic parity as our notion of fairness. We randomly chose an evaluation state to apply Fairness Warnings and left the rest for meta-training. We trained two Fair-MAML models as in fairness warnings using the demographic parity regularizer for the first model and equal opportunity regularizer for the second model. We used for the demographic parity Fair-MAML model and for the equal opportunity Fair-MAML model. We trained for meta-iterations in a -step optimization setting, with the update learning rate set to and the meta learning rate set to . The demographic parity Fair-MAML model scored demographic parity on the test set of the fine-tuning task and accuracy of . The equal opportunity Fair-MAML model scored accuracy and equal opportunity of .

To train Fairness Warnings on the fine-tuning task, we created shifted data sets of the fine-tuning test data. We trained a Fairness Warning for both demographic parity and equal opportunity. We used the rule of demographic parity in the demographic parity warning and a equal opportunity threshold in the equal opportunity warning. We found that or close to of the shifted data sets were classified fairly according to with respect to demographic parity and that of the shifted data sets were classified fairly according to equal opportunity. We trained SLIM using of and of for the demographic parity fairness warning. We adjusted to for the equal opportunity fairness warning.

SLIM was able to predict whether the mean shifts across the features in the communities and crime data set would result in demographic parity unfairness with accuracy on a test set. A random forest with estimators was able to predict the same task with 88% accuracy. In the equal opportunity setting, SLIM predicted the task with accuracy. A random forest with estimators was able to perform the same task with accuracy. The fairness warnings are presented in figure 4.

4.3.3. Communities and Crime Fairness Warning/Fair-MAML Analysis

The Fairness Warning trained on the fine-tuned Fair-MAML model is able to perform reasonable prediction accuracy and generates informative results. Particularly, it is interesting to consider that the demographic parity fine-tuned model behaves unfairly when the testing data set changes according to features such as number people living under the poverty line, in urban areas, and number of police officer. A similar result is found in the equal opportunity setting with police operating budget. In both the demographic parity and equal opportunity cases, the fairness warnings demonstrate that seemingly small and perhaps innocuous differences between states where Fair-MAML is trained and applied could result in unfair behavior. For instance, the addition of a couple dozen additional police officers across communities in a state in the demographic parity case could lead to the classifier behaving unfairly. The same is true for equal opportunity and a slight increase to the mean police operating budget. As we see in this example, reasonable real world changes to the testing distribution can result in negative changes to the group fairness of the fine-tuned Fair-MAML model. Providing Fairness Warnings to accompany the fine-tune meta model could lend additional guidance to a practitioner and help them better understand if their model will not behave fairly in application.

Feature Original Mean Score (+/- per unit increase/decrease) Total
mean people per family 3.1 people 2,000,000 points / person +……..
number of people living in urban areas 47,700 people -1 point / person +……..
number of people living under the poverty line 7,590 people -5 point / person +……..
number of sworn full time police officers 77.4 officers -130,000 points / officer +……..
(Warning accuracy: 71%)
Feature Original Mean Score (+/- per unit increase/decrease) Total
police operating budget $3M -2 points / $1M +……..
(Warning accuracy: 68%)
Figure 4. The Fairness Warnings for Fair-MAML applied to the communities and crime data set on the fine-tuning task. We consider Fair-MAML trained for both demographic parity and equal opportunity. Unlike in the COMPAS example, the features that the Fairness Warnings use are different though they both relate to aspects of policing.

5. Limitations and Conclusions

In this paper, we introduced Fairness Warnings and Fair-MAML. Fairness Warnings provides an interpretable model that predicts which changes to the testing distribution will cause a model to behave unfairly. Fair-MAML is a method that “learns to learn” fairly and can be used to train a fair model quickly from minimal data. We demonstrate empirically the usefulness of both methods through multiple examples on both synthetic and real data sets.

In this work, we explore Fairness Warnings applied to mean shifts in the testing distribution. It is a relatively straight forward extension to apply Fairness Warnings to other distribution shifts such as changes to the standard deviation. Though we are able to generate Fairness Warnings that show useful results, they ultimately are only applied to summary statistics. Meaning, changes to the distribution that are not captured by such statistics could affect fairness in unpredictable ways. Thus, we only propose fairness warnings as boundary conditions under which the model may not be fair. In this regard, receiving a non-unfair score in fairness warnings does not guarantee that the model will behave fairly in the new domain. We emphasize the importance of this directionality to any lawmakers or practitioners who would be interested in using Fairness Warnings and advise that they be used only to decide against the use of certain models instead of verify that models will behave fairly. A final limitation to our work is that we assess Fair-MAML when there are many related training tasks to learn from. In reality, there may only be a few related training tasks available. We leave assessing how useful Fair-MAML is on domains with only a few related training tasks to future work.


  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Link Cited by: §3.3.
  • J. Angwin, J. Larson, S. Mattu, and L. Kirchner (2016) Machine bias. ProPublica. Cited by: §4.1.1.
  • S. Barocas, M. Hardt, and A. Narayanan (2018) Fairness and machine learning. fairmlbook.org. Cited by: §1.
  • S. Barocas and A. D. Selbst (2016) Big data’s disparate impact. Calif. L. Rev. 104, pp. 671. Cited by: §1, §2.1, §3.1.1.
  • R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth (2017) A convex framework for fair regression. ArXiv abs/1706.02409. Cited by: §3.3.
  • S. Bickel, M. Brückner, and T. Scheffer (2009) Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, pp. 2137–2155. Cited by: §1.
  • T. Calders and S. Verwer (2010)

    Three naive bayes approaches for discrimination-free classification

    Data Mining and Knowledge Discovery 21 (2), pp. 277–292. Cited by: §2.1.
  • A. Chouldechova and A. Roth (2018) The frontiers of fairness in machine learning. ArXiv abs/1810.08810. Cited by: §1.
  • A. Chouldechova (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5 (2), pp. 153–163. Cited by: §2.1.
  • A. Coston, K. N. Ramamurthy, D. Wei, K. R. Varshney, S. Speakman, Z. Mustahsan, and S. Chakraborty (2019) Fair transfer learning with missing protected attributes. In

    Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, Honolulu, HI, USA

    Cited by: §1.
  • C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel (2012) Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, New York, NY, USA, pp. 214–226. External Links: ISBN 978-1-4503-1115-1, Link, Document Cited by: §2.1.
  • C. Dwork, N. Immorlica, A. T. Kalai, and M. Leiserson (2018) Decoupled classifiers for group-fair and efficient machine learning. In Conference on Fairness, Accountability and Transparency, pp. 119–133. Cited by: §1.
  • T. U.S. EEOC (1979) Uniform guidelines on employee selection procedures. Cited by: §3.1.1.
  • M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268. Cited by: §1, §1, §2.1, §3.1.1.
  • C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 1126–1135. External Links: Link Cited by: §1, §3.2.2, §3.3.
  • Friedler, Scheidegger, Venkatasubramanian, Choudhary, Hamilton, and Roth (2019) A comparative study of fairness-enhancing interventions in machine learning. In ACM Conference on Fairness, Accountability and Transparency (FAT*), External Links: Link Cited by: §3.2.1.
  • M. E. Hamill, M. C. Hernandez, K. R. Bailey, M. D. Zielinski, M. A. Matos, and H. J. Schiller (2019) State level firearm concealed-carry legislation and rates of homicide and other violent crime. pp. 1–8. Cited by: §4.2.3.
  • M. Hardt, E. Price, and N. Srebro (2016) Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, USA, pp. 3323–3331. External Links: ISBN 978-1-5108-3881-9, Link Cited by: §1, §2.1.
  • L. Huang and N. Vishnoi (2019) Stable and fair classification. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 2879–2890. External Links: Link Cited by: §3.3.
  • N. Kallus and A. Zhou (2018) Residual unfairness in fair machine learning from prejudiced data. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 2439–2448. External Links: Link Cited by: §1, §1.
  • T. Kamishima, S. Akaho, and J. Sakuma (2012) Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 35–50. Cited by: §3.3.
  • C. Lan and J. Huan (2017) Discriminatory transfer. Workshop on Fairness, Accountability, and Transparency in Machine Learning. Cited by: §1.
  • M. Lichman (2013) UCI machine learning repository. External Links: Link Cited by: §4.2.3.
  • Z. C. Lipton, Y. Wang, and A. J. Smola (2018) Detecting and correcting for label shift with black box predictors. ICML. Cited by: §1.
  • D. Madras, E. Creager, T. Pitassi, and R. Zemel (2018) Learning adversarially fair and transferable representations. International Conference on Machine Learning. Cited by: §1, §4.2.3.
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §3.3.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) ”Why should I trust you?”: explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135–1144. Cited by: §3.1.1.
  • A. Romei and S. Ruggieri (2014) A multidisciplinary survey on discrimination analysis.

    The Knowledge Engineering Review

    29 (5), pp. 582–638.
    Cited by: §1.
  • C. Schumann, X. Wang, A. Beutel, J. Chen, H. Qian, and E. H. Chi (2019) Transfer of machine learning fairness across domains. CoRR abs/1906.09688. External Links: Link, 1906.09688 Cited by: §1.
  • D. Slack, S. A. Friedler, C. D. Roy, and C. Scheidegger (2019) Assessing the local interpretability of machine learning models. CoRR abs/1902.03501. External Links: Link, 1902.03501 Cited by: §3.1.1.
  • A. Subbaswamy, P. G. Schulam, and S. Saria (2018) Preventing failures due to dataset shift: learning predictive models that transport. In AISTATS, Cited by: §1.
  • J. Tiihonen, P. Halonen, L. Tiihonen, H. Kautiainen, M. Storvik, and J. Callaway (2017) The association of ambient temperature and violent crime. Scientific Reports 7 (1), pp. 6543. External Links: ISSN 2045-2322, Document, Link Cited by: §4.2.3.
  • B. Ustun and C. Rudin (2015) Supersparse linear integer models for optimized medical scoring systems. Machine Learning 102, pp. 349–391. Cited by: §3.1.3.
  • J. Vanschoren (2019) Meta-learning. In Automated Machine Learning: Methods, Systems, Challenges, F. Hutter, L. Kotthoff, and J. Vanschoren (Eds.), pp. 35–61. External Links: ISBN 978-3-030-05318-5, Document, Link Cited by: §2.2.
  • O. Vinyals, C. Blundell, T. P. Lillicrap, K. Kavukcuoglu, and D. Wierstra (2016) Matching networks for one shot learning. In NIPS, Cited by: §2.2.
  • M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi (2017a) Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180. Cited by: §1, §2.1.
  • M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi (2017b) Fairness constraints: mechanisms for fair classification. AISTATS. Cited by: §4.2.1.
  • M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi (2017c) Fairness constraints: mechanisms for fair classification. In Artificial Intelligence and Statistics, pp. 962–970. Cited by: §1.
  • I. Zliobaite (2015) A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148. Cited by: §1.