Auditing and Achieving Intersectional Fairness in Classification Problems

11/04/2019 ∙ by Giulio Morina, et al. ∙ 0

Machine learning algorithms are extensively used to make increasingly more consequential decisions, so that achieving optimal predictive performance can no longer be the only focus. This paper explores intersectional fairness, that is fairness when intersections of multiple sensitive attributes – such as race, age, nationality, etc. – are considered. Previous research has mainly been focusing on fairness with respect to a single sensitive attribute, with intersectional fairness being comparatively less studied despite its critical importance for modern machine learning applications. We introduce intersectional fairness metrics by extending prior work, and provide different methodologies to audit discrimination in a given dataset or model outputs. Secondly, we develop novel post-processing techniques to mitigate any detected bias in a classification model. Our proposed methodology does not rely on any assumptions regarding the underlying model and aims at guaranteeing fairness while preserving good predictive performance. Finally, we give guidance on a practical implementation, showing how the proposed methods perform on a real-world dataset.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Fairness is a growing topic in the field of machine learning as models are being built to determine life-changing events such as loan approvals and parole decisions. Thus, it is critical that these models do not discriminate against individuals on the basis of their race, gender or any other sensitive attribute, by learning to replicate and reinforce the biases inherent in society or indeed introduce new biases. Whilst much of the algorithmic fairness literature thus far has focused on fairness with respect to an individual sensitive attribute, in this work we consider fairness for an intersection of sensitive attributes. That is, we aim to ensure fairness against groups defined by multiple sensitive attributes, for example, “black women” instead of just “black people” or “women”.

Ensuring intersectional fairness is critical for safe deployment of modern machine learning systems. A stark example of intersectional bias in deployed systems was discovered by Buolamwini and Gebru (2018) who showed that several commercially available gender classification systems from facial image data had substantial intersectional accuracy disparities when considering gender and race (represented via Fitzpatrick skin type), with darker-skinned women being the most misclassified group – having an accuracy drop of over 30% compared to lighter skinned men. Buolamwini and Gebru (2018) emphasize the need for investigating the intersectional error rates, noting that gender and skin type alone do not paint the full picture regarding the distribution of misclassifications.

One cause for bias is the data itself. A known issue in many consequential application domains is that the recorded data often does not appropriately reflect the full diversity of the population. This disproportionate representation is further exacerbated for intersectional subgroups. Indeed, Buolamwini and Gebru (2018) note that common facial analysis benchmarks are overwhelmingly composed of lighter-skinned subjects ( for IJB-A and for Adience). In their study on the increased risk of maternal death among ethnic minority women in the UK, Ameh and Van Den Broek (2008) noted that there was limited data specifically for black and ethnic minority women born in the UK, and emphasized the need for reliable statistics to understand the scale of the problem.

Although of crucial importance, it is only recently that greater focus in the algorithmic fairness literature has been posed on intersectional fairness. Commonly used fairness metrics were developed with a single sensitive attribute in mind, and cannot be directly applied under this setting.

Our contribution

We present a comprehensive framework for assessing and achieving intersectional fairness, based on:

  • Extending well-established fairness metrics to the case of intersectionalities, allowing for a thorough analysis of discrimination in both datasets and model outputs.

  • Proposing novel ways to robustly estimate such metrics, even in the case where subgroups of the population are under-represented.

  • Developing post-processing methodologies that can improve the intersectional fairness of any already available classification model.

Our work builds upon the concept of -differential fairness introduced by Foulds et al. (2018)

and extends it to several of the most widely used fairness metrics for auditing bias in datasets and model outputs. More specifically, we extend their definition of differential fairness for statistical parity to: 1) elift and impact ratio metric for data, and 2) equal opportunity and equalized odds metrics for model outputs. Our work aims to give practitioners the ability to assess intersectional fairness through multiple, not mutually exclusive, lenses.

We are also interested in studying real-world scenarios, where certain intersectional subgroups are often disadvantaged due to societal or data collection bias, and therefore under-represented. This presents a challenge when trying to audit for intersectional fairness from a given finite dataset, where such biases are often found. We propose to move further from the smoothed empirical estimator proposed originally by Foulds et al. (2018) and provide robust estimation procedures via bootstrap and fully Bayesian techniques. Importantly, we provide theoretical guarantees and demonstrate the performance of the estimators qualitatively and experimentally on a synthetic dataset.

Furthermore, we want to provide viable solutions to mitigate any detected discriminating bias: we develop post-processing methods for binary classification models that threshold risk scores and randomize predictions separately for each intersection of sensitive attributes, combining and extending the work of Hardt et al. (2016) and Corbett-Davies et al. (2017). Our methods maximize predictive performance whilst guaranteeing intersectional fairness. A key advantage of our formulation is that it allows the practitioner to focus on multiple fairness metrics at the same time, thus allowing to control for multiple facets of model bias simultaneously. We provide implementation details and demonstrate the utility of our methods experimentally on the Adult Income Prediction problem (Dua and Graff, 2017), achieving intersectional fairness when 2 and 3 sensitive attributes, respectively, are considered.

Paper structure

We discuss related work in Section 2. We extend common fairness metrics to accommodate for intersectionalities in Section 3, proving some of their theoretical properties in Section 3.1 and presenting methods for robustly estimating them in Section 3.2. In Section 4, we phrase post-processing as an optimization problem which aims to preserve good predictive performance while ensuring intersectional fairness; we introduce the formulations in Sections 4.2 and 4.3 for binary and score predictors, respectively. Practical implementation of post-processing methods is discussed in Section 5. We demonstrate the utility of our methods experimentally on a synthetic experiment and on the Adult dataset (Dua and Graff, 2017) in Section 6. In Section 7, we conclude and suggest future work. Proof of all the results stated in the paper are presented in the supplementary material.

2. Related work

There is no universally accepted “best” fairness definition, nor is there one that is considered suitable for all use cases and application domains. There exist more than 20 different fairness metrics (Zliobaite, 2015; Narayanan, 2018), and some have been shown to be mutually incompatible (Kleinberg, 2018; Pleiss et al., 2017). Therefore, the appropriate fairness definition and a corresponding metric need to be selected depending on the application, context, and any regulatory or other requirements.

One can broadly divide the abundant fairness definitions into group and individual fairness. Group fairness splits the population into groups according to the sensitive attributes and aims to ensure similar treatment with respect to a fixed statistical measure; individual fairness seeks for individuals with similar features to be treated similarly regardless of their sensitive attributes. Our work focuses on group fairness metrics.

Assessing group fairness of a dataset or model output becomes much more challenging when considering potentially dozens of sensitive attributes (J. Kotkin, 2008). The number of generated subgroups grows exponentially with the number of attributes considered, making it difficult to inspect every subgroup for fairness due to both computational as well as data sparsity issues. A first challenge is, therefore, to come up with fairness metrics (either by modifying the widely used metrics or developing new ones) that can accommodate a large number of intersectional subgroups (Hebert-Johnson et al., 2018; Kearns et al., 2018; Creager et al., 2019). Our work builds most directly upon the -differential fairness metric introduced by Foulds et al. (2018), which we suggest to interpret as a generalization of statistical parity – a notion of fairness also found in certain legal requirements (Feldman et al., 2015). Such a metric satisfies important desiderata, overlooked by other multi-attribute metrics (Kearns et al., 2018; Hebert-Johnson et al., 2018). It 1) considers multiple sensitive attributes, 2) protects subgroups of the population as defined by their intersectionalities (e.g., “black women”) as well as by individual sensitive attributes (e.g., “women”), 3) safeguards minority groups, and 4) aims at rectifying systematic differences between groups. -differential fairness also satisfies other important properties, such as providing privacy, economical, and generalization guarantees. While in Foulds et al. (2018) the focus was mainly to enable a more subtle understanding of unfairness than with a single sensitive attribute, our work presents multiple metrics that allow more nuanced analysis of intersectional discrimination.

Nevertheless, all the introduced metrics pose algorithmic challenges when auditing for intersectional bias is of interest. While Foulds et al. (2018) proposes a pointwise estimate, we have found that it can be unstable in the case of sparse data; we illustrate this in Example 6.1. Several other methods have been proposed in the literature for handling intersectional fairness that either make use of ad-hoc algorithms or are based on visual analytic tools (Cabrera et al., 2019; Kim et al., 2019). For intersectional bias detection, Chung et al. (2019) suggest a top-down method to find underperforming subgroups. The dataset is divided into more granular groups by considering more features until a subgroup with statistically significant loss is found. In contrast, Lakkaraju et al. (2017) use approximate rule-based explanations to describe subgroup outcomes.

As well as detecting discriminatory bias, another line of research has been focusing on achieving “fairer” models. There are three possible points of intervention to mitigate unwanted bias in the machine learning pipeline: the training data, the learning algorithm, and the predicted outputs, which are associated with three classes of bias mitigation algorithms: pre-processing, in-processing, and post-processing.

Pre-processing methods a-priori transform the data to remove bias or extract representations that do not contain information related to sensitive attributes (Madras et al., 2018; Dwork et al., 2012; Calmon et al., 2017; Pedreshi et al., 2008; Kamiran and Calders, 2012; Zemel et al., 2013); in-processing methods modify the model construction mechanism to take fairness into account (Woodworth et al., 2017; Zhang et al., 2018; Celis et al., 2019; Kamishima et al., 2012; Raff et al., 2018); while post-processing methods transform the output of a black-box model in order to decrease discriminatory bias (Hardt et al., 2016; Jagielski et al., 2019; Corbett-Davies et al., 2017). Kearns et al. (2018, 2019) propose and demonstrate the performance of an in-processing training algorithm which mitigates intersectional bias by imposing fairness constraints on the protected subgroups. Their work is a generalisation of the “oracle efficient” algorithm by Agarwal et al. (2018) to the case of infinitely many protected subgroups. Foulds et al. (2018)

also proposes an in-processing learning algorithm based on the construction of a “fair” neural network.

In contrast to this approach, we develop a novel post-processing methodology. Post-processing procedures received great attention in applications as they do not interfere with the training process and therefore are suitable for run-time environments. In addition, post-processing techniques are model agnostic and privacy preserving as they do not require access to the model or features other than sensitive attributes (Kamiran and Calders, 2012). The work of Hardt et al. (2016) aims to ensure equal opportunity for two subgroups of the population, as defined by a single binary sensitive attribute. They achieve this by carefully randomly flipping some of the predictions in order to mitigate discriminatory bias. Another approach is explored by Corbett-Davies et al. (2017), where fairness is guaranteed by treating model predictions differently according to the subgroup individuals belong to. We combine and expand their approach to the case of intersectional fairness.

3. Metrics for intersectional fairness

In this section, we introduce fairness metrics that can handle intersections of multiple sensitive attributes. Such metrics can be applied to asses fairness in either the data or in model outputs. Robustly estimating them is non-trivial, especially when more sensitive attributes are considered, as some subgroups may be under-represented in the dataset. Indeed, minorities in the population may be even more severely under-represented in a dataset compared to their true representation in the general population, one cause of which is bias in the data collection practices. After defining the metrics in Section 3.1, in Section 3.2 we present three different approaches for robustly estimating intersectional fairness for impact ratio, but the same notions can be applied to other intersectional metrics.


Let be the number of different sensitive attributes. We denote by disjoint sets of discrete-valued sensitive attributes; e.g., can represent gender, race, nationality and so forth. The space of the intersections is denoted as . Therefore, a specific element is a particular combination of attributes; e.g., .

Suppose we have access to a finite dataset with observations denoted as ; represents the individual’s features – including their sensitive attributes – and a binary outcome. We interpret as a “positive” outcome and “negative” otherwise, denoting by

the random variable describing the true population’s outcomes. Furthermore, we let

be a discrete random variable with support on

. For brevity, we denote its probability mass function by

; i.e., the probability that any individual has sensitive attributes . Analogously, we denote by the probability that a given individual has positive outcome. Finally, we will also denote the probability that an individual with sensitive attributes has positive outcome as . We do not make explicit assumptions on the distribution of or but we shall assume .

When a classifier model is available, we denote by

the prediction for the th individual and by the corresponding random variable. Importantly, we do not make any assumptions on how the model has been constructed and regard it as a black-box.

3.1. Definitions of Metrics

We now introduce intersectional fairness metrics for data and model outputs. We build on the definition of differential fairness introduced by Foulds et al. (2018). The definitions we introduce in this paper can be seen as a relaxation of the widely-used ones, to account for the fact that the number of intersections of sensitive attributes grows exponentially. In Table 1 we define fairness metrics to asses bias in the data, while Table 2 defines metrics to asses bias in model outputs. With the exception of -differential fairness for statistical parity, intersectional fairness definitions for the other metrics (cf. Table 1 and 2) are, to the best of our knowledge, novel contributions of this paper. We prove some of their theoretical properties later in Theorem 3.1. Although we restrict our analysis to fairness metrics for binary outcomes, they can be easily extended to the categorical case by simply requiring them to hold for all possible outcomes.

We refer the reader to Foulds et al. (2018) for an interpretation of -differential fairness in terms of differential privacy. We note that corresponds to achieving perfect fairness with respect to a given metric. Moreover, -differential fairness allows us to compare bias between two different models. In particular, if we assume that two models achieve -differential fairness for and respectively, then the quantity can be interpreted as a multiplicative increase/decrease of one model’s bias with respect to the other, a phenomenon known as bias amplification (Zhao et al., 2017).

Fairness metric Intersectional definition elift impact ratio (slift)
Table 1. -differential fairness metrics on the data
Fairness metric Intersectional definition statistical parity (demographic parity) TPR parity (equal opportunity) FPR parity equalized odds If -differential fairness is satisfied for both TPR and FPR parity
Table 2. -differential fairness metrics on the model

A key question is whether these intersectional fairness definitions guarantee fairness with respect to individual sensitive attributes or any arbitrary subset of them. In other words, we would like to prove that if -differential fairness is satisfied for , then it is also satisfied when only is considered, and any other possible combination. Theorem 3.1 proves that this is indeed the case and -differential fairness is guaranteed to hold. Notice that this is a lower bound and in practice fairness for the subgroups may be satisfied for smaller values than . This is certainly the case when the elift metric is considered, as we prove that -differential fairness holds for the same value of in all subgroups.

Theorem 3.1 ().

Let , where . If -differential fairness is satisfied for any of the metrics in Tables 1 and 2 on the space of intersections , then -differential fairness is also satisfied on the space for the same metric.

3.2. Robust Estimation of Intersectional Fairness

We now tackle the problem of auditing discriminatory bias having only access to a finite dataset . In particular, we are interested in the case where some combinations of sensitive attributes may be under-represented in the data. This is often the case in real-world datasets, usually due to historical reasons or inherent bias. We first make clear what we mean by auditing for intersectional fairness. We then explore three different methodologies to achieve this: 1) smoothed empirical estimation, where fairness metrics are directly computed from the data, 2) bootstrap estimation, to measure uncertainty in the empirical estimates, and 3) Bayesian modelling

, to provide credible intervals.

We measure discriminatory bias in the data by computing the minimum value of such that one or more of the differential fairness definitions proposed in Section 3.1 holds. For the sake of exposition, we shall focus on estimating for the impact ratio metric, but the same reasoning can be readily extended to the other metrics. We then consider the problem of computing, as per Table 1:


In general, the practitioner will not only be interested in computing , but also in checking which attributes determine big values of the ratios .

Computing may appear straightforward: we could just calculate for all and let . However, the values of are usually unknown and estimating them from the data for all the values of can be challenging, as few instances of a particular combination of attributes may be available in the dataset . Moreover, as previously mentioned, minority subgroups may be even more severely under-represented in the dataset compared to their true representation in the general population, making the problem even harder. Therefore, we now introduce three different methods to estimate : empirically from the data, via a bootstrap procedure, and with a Bayesian approach.

3.2.1. Smoothed Empirical Estimation

A simple approach is to directly estimate from the data, as proposed by Foulds et al. (2018). In particular, we can set


where is the empirical count of occurrences of individuals with attributes and positive outcome in the dataset , while is the total number of individuals with attributes . We introduce smoothing parameters as or may be small, due to data sparsity. Note that Equation 2 represents the expected posterior value of a Beta-Binomial model with prior parameters . The final estimate of can be obtained as:

This estimation procedure requires computing for all possible combinations of attributes , leading to computational complexity. In general, it can be hard to tune the parameters and properly. In particular, big values of either or will introduce additional bias, while small values of will not solve the data sparsity problem. Therefore, this procedure is not robust; will generally be biased and no uncertainty quantification can be provided. Nevertheless we now prove in Proposition 3.2 the appealing property that, as the dataset size grows, the smoothed empirical estimator converges to the true value regardless of the chosen smoothing parameters. Although the result holds for , in practice one would choose them to be non-negative, and set them both to zero when no smoothing is desired.

Proposition 3.2 ().

The smoothed empirical estimate of for any -differential fairness metric is consistent .

3.2.2. Bootstrap Estimation

We propose to resort to bootstrap estimation to provide confidence intervals for the estimate

. We generate different datasets by taking with replacement observations from the original dataset . For each bootstrap sample, we obtain an estimate as in Equation 2. The final estimate is obtained by averaging over the sample and empirical confidence intervals can be easily constructed. The computational complexity is , but in practice we also observe a computational overhead due to the construction of the datasets. Notice that some of the generated datasets may not contain instances of specific attributes , producing undefined values if the smoothing parameters are set to zero. This observation motivates why bootstrap estimates are not consistent (cf. Proposition 3.2) unless also , as some of the combinations of sensitive attributes may not be represented in any of the bootstrapped datasets. This is usually not problematic in practice, provided that the fixed size of the bootstrapped datasets is big enough.

3.2.3. Bayesian Estimation

Motivated by the form of Equation 2, we propose a Bayesian approach by considering the likelihood

and setting its conjugate prior

. The posterior is therefore tractable and given by

We resort to Monte Carlo simulation techniques to get an estimate of . In particular, we simulate values of from the posterior and use them to compute the estimate of as in Equation 1, with a computational complexity of . By considering the average of the so-constructed sample we can obtain the final estimate of . Moreover, this procedure promptly provides credible intervals. Finally, we note that the simulated values of will always be greater than zero, so that we do not need to resort to any further smoothing. The prior parameters can incorporate a practitioner’s domain knowledge of the problem or can be set close to zero to suggest no prior information. It follows from Proposition 3.3 that this estimator is also consistent.

Proposition 3.3 ().

The Bayesian estimate of for any -differential fairness metric is consistent .

4. Post-Processing of Classifier Model

Often, we have access to outputs of a classification model that has already been trained and calibrated, but we may not have any knowledge on how such predictions were made either because the model is hard to interpret or because we do not have access to the model itself. Therefore, we will always assume that we only have access to a black-box predictor. We will refer to it as a “binary predictor” if its outputs are either 0 or 1 (or any other binary labels) and as a “score predictor” if its outputs are in .

We showed in Section 3 how to asses intersectional fairness of model outputs via different metrics. A natural next question is how to mitigate any detected bias. We argue that when possible, the best way to ensure fairness is to collect more representative data and retrain the model. Nevertheless, it is commonly the case that only historical data — where conscious or unconscious bias is often present — is available, so that it is impossible to gather more information. Moreover, training a new classifier may be impractical due to cost and time constraints. This motivates the need to develop post-processing techniques that are model agnostic. Indeed, we shall make no assumptions on the model training mechanism, and only require access to its outputs and on the sensitive attributes.

We follow the approach taken by Hardt et al. (2016) and aim to construct a derived predictor that achieves better fairness with respect to one or more chosen metrics. In particular, we propose a class of derived predictors that can handle classifiers returning either binary predictions or scores. In the following, we will denote by either or the post-processed predictions (the distinction between the two will be made clear in Section 4.3.1), while as usual will denote the given predictor outcomes. Section 4.1 discusses a general framework for the construction of derived predictors. We explore how to compute them for a binary and score predictor in Sections 4.2 and 4.3 respectively.

The main characteristic of a derived predictor is that its value depends only on the given prediction and on the individual’s combination of sensitive attributes . We formally define it as follows:

Definition 4.1 (Derived Predictor, (Hardt et al., 2016)).

A derived predictor is a random variable whose distribution depends solely on a classifier predictions and a combination of sensitive attributes .

Our aim is to construct a derived predictor that, by transforming predictions of a given classifier, achieves better fairness in terms of one or more -differential fairness metric(s). If the model only returns binary predictions , we can resort to randomization, that is, randomly flipping some of the predictions. On the other hand, when the model returns scores, constructing a derived predictor becomes more challenging. We focus on a specific class of derived predictors:

Definition 4.2 (Randomized Thresholding Derived Predictor).

Given a classifier returning predictions , the Randomized Thresholding Derived Predictor (RTDP) is a Bernoulli random variable such that


where is the indicator function and , are unknown parameters.

We interpret Equation 3 as follows: given an individual with predicted score and attributes , we first construct a binary prediction by considering a threshold and then, with a specific probability, we accommodate the possibility to reverse it or keep it. In particular we can equivalently write:

so that is the probability of flipping what would have been a negative prediction, while is the probability of keeping a positive prediction.

Note that Definition 4.2 covers also the case where the model returns only binary predictions and we explore this case in more details in Section 4.2. In consequential applications, randomization may not be desired or cannot be employed, due to legal or other requirements. Definition 4.2 allows to construct a deterministic derived predictor by setting and .

4.1. Formulation as an optimization problem

We construct the RTDP by solving an optimization problem. In order to asses performance of the post-processed model, we introduce a loss function

that given the true and the predicted outcomes, returns the cost of making such a prediction, following the approach of Hardt et al. (2016). Without loss of generality, we will assume , so that making correct predictions does not contribute to the loss. Indeed, if either a bonus or a penalty is desired for correct predictions, it can be incorporated by changing the values of and . Therefore, by minimizing the expected loss function we preserve good predictive performance.

To control the discriminatory bias of the post-processed model, the user has to select a value of that they wish to achieve for one or more of the intersectional metrics (cf. Table 2). We consider two possible approaches to find the unknown parameters : 1) minimizing the expected loss subject to the selected fairness metric(s) being satisfied for the chosen , or 2) adding a penalty term to the expected loss for values of the parameters that do not satisfy the required fairness constraint.

For instance, one established fairness guideline is the 80% rule for statistical parity (Feldman et al., 2015); corresponding to requiring -differential fairness for statistical parity to hold for (cf. Theorem 3.1). Therefore, we require to have

We can either consider this as a constraint in the parameter space of the optimization problem or consider minimizing

for appropriately large. The two approaches are in principle equivalent, but their practical implementation may differ as different numerical optimization routines need to be used. Note that statistical parity is not the only fairness constraint that can be considered; for instance in Section 6 we will aim to achieve better equalized odds intersectional fairness.

We now show in Proposition 4.3 how to rewrite the expected loss as a weighted sum of the False Positive Rate and of the False Negative Rate of the post-processed model, where the weights depend on .

Proposition 4.3 ().

Minimizing is equivalent to minimizing


Another approach to construct the RTDP is by maximizing a utility function. For instance, Corbett-Davies et al. (2017) consider the immediate utility function, defined as . This approach may be preferable, as it only requires tuning a constant that can be interpreted as the cost of making a positive prediction. We prove in Proposition 4.4 that our optimization framework accommodates this approach.

Proposition 4.4 ().

Let the immediate utility function be

Then, maximizing this function is equivalent to minimizing Equation 4 when setting and .

4.2. Post-processing of a Binary Predictor

In this section we consider having access to a classifier that returns binary predictions . In this case, as only binary predictions are available, we set and tune the probabilities and to construct the derived predictor. To find the unknown parameters we minimize the expected loss subject to the required fairness constraint. Proposition 4.5

shows that this optimization problem can be efficiently solved via linear programming.

Proposition 4.5 ().

Assume setting in Definition 4.2 and optimizing the variables such that the expected loss is minimized and any of the model output metrics (cf. Table 2) is below a user-defined threshold. Then, the optimization problem is a linear programming problem.

We conclude that in the case of a binary predictor, a RTDP can be computed in polynomial time (Karmarkar, 1984).

4.3. Post-processing of a Score Predictor

In this section we assume that we have access to model outputs in the form of scores , where high scores indicate high probability of a positive outcome. We emphasize that we do not need to know any further information on how these scores were computed, and can treat the underlying model as a black-box. To construct the RTDP we can optimize both the probabilities , and the thresholds for all , corresponding to a total of parameters to optimize. Although we don’t observe over-fitting in the experiments we run in Section 6

, in other applications it may be necessary to use cross-validation or to add regularization terms to reduce the degrees of freedom (e.g., imposing

for some ). In consequential application it is often undesirable to have random predictions, therefore we explore in detail the “deterministic” scenario in Section 4.3.1. The case where both the thresholds and the probabilities are optimized is discussed in Section 4.3.2.

4.3.1. Deterministic post-processing

If no randomization is desired, we construct an RTDP fixing and . This case is of particular interest as randomization may be undesirable in real-world applications, for instance when assessing judicial decisions (Angwin et al., 2016). Moreover, we carefully tune the thresholds , as they will drive the predictive performance of the post-processed model.

To explicitly distinguish this case, we denote the post-processed prediction as and define post-processed model performance metrics as follows:

Definition 4.6 ().

Define the post-processed model performance metrics of the RTDP when no randomization is used as

Note that although not explicitly stated, the metrics introduced in Definition 4.6 are functions of the thresholds .

Although intuitive, applying randomization on top of a well-performing model will in general worsen its performance. This is formally proved in Proposition 4.7 where we show that, under reasonable conditions, it is indeed counter-productive to apply randomization when achieving better fairness is not an objective.

Proposition 4.7 ().

Let us consider a predictor returning scores and solving

where is the RTDP of Definition 4.1. This is equivalent to setting and solving

if and only if


Notice that the assumption of Equation 5 requires that the given model performs sufficiently well across all the intersections of sensitive attributes . As an example, if and , Equation 5 translates as requiring greater and than and respectively for all the subgroups.

4.3.2. Post-processing using randomization

We now focus on constructing an RTDP by finding both the optimal thresholds and probabilities . We first consider a simple approach that we name “sequential post-processing”. We first find optimal thresholds when no fairness constraints are imposed. By applying such thresholds, we convert the scores to binary predictions, so that we can find optimal probabilities that achieve the desired fairness constraints. This procedure appears appealing as we can loosely interpret the thresholding as a way to maximize predictive performance, and the randomization as a method to achieve better fairness. However, although the final result may be acceptable for the case at hand, there is no theoretical guarantee that this procedure will return the actual optimum.

A different solution which we will refer to as “overall post-processing”, is to solve the following optimization problem:


where is the optimal cost function value found by solving the optimization problem only in the variables , for a fixed (cf. Section 4.2). Although this may seem as adding an extra layer of complexity, we note that values of can be efficiently computed via linear programming. In general, will not be a differentiable function, so that optimizing it will always pose a challenge.

5. Implementation

We first discuss practical implementation of post-processing techniques for a binary predictor. In this case, we showed in Proposition 4.5 that the RTDP can be obtained by solving a linear programming problem. In practice, we need to compute the unknown model metrics . We propose to use the same techniques introduced in Section 3 to estimate them; i.e., directly from the data, via bootstrap estimation, or Bayesian modelling.

Practical implementation of computing an RTDP from a score predictor is more challenging. We first provide a way to evaluate the expected loss function for any arbitrary value of and of . The expected loss function is given in Equation 4

and, by applying the law of total probability, can be rewritten as:

The unknown constant base rates and can be estimated from the data via any of the techniques introduced in Section 3. To compute the post-processed metrics and (cf. Definition 4.6), we first apply the thresholds to the scores available on a validation dataset. We then estimate the metrics by using either bootstrap or Bayesian techniques as in Section 3. The values of and can then be readily computed as:

Note that since and are estimated directly from the data, they will be piecewise constant functions of the threshold . Therefore, gradient-based optimization routines are unlikely to succeed when this approach is considered as the gradient of the objective function – if defined – will be zero at all points. Moreover, the optimum will not be unique.

To address this issue, we propose to smoothen the objective function by smoothing out the model performance metrics. In particular, we propose a modelling approach by constructing the random variables and . We model them both as Beta random variables and estimate their parameters by maximum likelihood estimation. We finally let for any arbitrary threshold , and . This leads to a smooth function that preserves monotonicity as depicted in Figure 1.

As a finite number of instances is available in the dataset, estimate of FPR are the same for different values of the threshold . If no smoothing is applied, FPR is a step-wise function of the threshold. By using the proposed technique, FPR appears as a smooth monotonic curve.

Figure 1. Illustrative example of FPR for individuals with combination of attributes as a function of the threshold

. Red points represent estimates of the FPR, the blue interpolating line depicts the non-differentiable step function, and the orange line is the smoothed curve obtained via a Beta modelling approach.

6. Experiments

We first propose an ad-hoc experiment using a generated dataset to compare -differential fairness estimation techniques of Section 3.2. We then apply them, together with the proposed post-processing methods, to the Adult’s income prediction problem in Section 6.2.

6.1. Synthetic Experiment

We design this experiment to compare the estimation techniques of -differential fairness proposed in Section 3.1. We first discuss how we generated the synthetic datasets so that we have access to the true value of , thus allowing for a full comparison. We then discuss results and compare the different methodologies.

Dataset generation

We consider a set , consisting of a binary sensitive attribute, and , consisting of a different sensitive attribute with 3 possible values. Therefore, the space encompasses 6 different intersections. We fix true base rates as follows:


As a result of this choice, the intersection of attributes is not going to be well represented in the dataset and, moreover, there will be few positive outcomes for individuals with such characteristics. Indeed, we purposely fixed base rates as above to mimic real-world scenarios where a particular subgroup can be under-represented, either in the general population or in a particular dataset. The true value of can be exactly computed as .


We first observe how the estimate behaves as the size of the dataset increases and we analyze the confidence intervals obtained via either bootstrap or Bayesian estimation. We fix as the number of bootstrapped datasets, each of size equal to the original one, and smoothing parameters to avoid divisions by zero. When considering the Bayesian approach, we generate Monte Carlo samples and consider a non-informative prior by setting , as proposed by Kerman (2011).

Results are presented in Figure 2. As Propositions 3.2 and 3.3 prove, we observe that all the methods converge to the true value as the dataset size grows. Clearly, no confidence intervals can be obtained with the smoothed empirical estimator. On the other hand, we notice that for small values of the dataset size, the confidence intervals provided by the bootstrap method are generally wider than the ones obtained via a Bayesian approach. This is not surprising, as the estimate of is particularly unstable if any instances with combination of attributes are not replicated in one of the bootstrapped datasets. Indeed, when we look at the distribution of for low values of in these experiments, we observe one peak near the true value and another peak determined by our choice of smoothing parameters.

To further assess properties of the proposed estimators, we approximate their Mean Squared Error (MSE). To do so, we generate 1,000 different datasets of increasing size with the same true base rates as in Equation 7. For each dataset, we obtain an estimate of using the proposed estimation techniques. Results are presented in Figure 3. We notice that the estimate obtained via a Bayesian approach performs better for all considered dataset sizes. On the other hand, bootstrap performs slightly worse than empirical estimation for small dataset sizes. As mentioned above, this is due to the fact that when one attribute intersection, such as in our experiment, is poorly represented in the bootstrapped dataset, we obtain biased estimates of .

We conclude that the smoothed empirical estimator requires considerably less computational effort than the other two proposed methods, however it does not provide any insight on how reliable the fairness metric estimate is. When this is desired, we suggest using either bootstrap estimation or Bayesian modelling, or possibly both. We finally observe that the Bayesian procedure is in general faster than the bootstrap one, as the posterior parameters need to be computed only once and no overhead is observed.

The plot shows how the estimate of becomes increasingly accurate as the size of the synthetic dataset increases. When either bootstrap or Monte Carlo are used as estimation techiques, 95% confidence intervals can be plotted.

Figure 2. Comparison of different estimators of impact ratio fairness metric on synthetic datasets of increasing size. Vertical bars represent 95% confidence intervals for bootstrap and Bayesian estimation where 1,000 bootstrapped dataset and Monte Carlo sample have been drawn respectively.

The plot shows how the MSE of the estimator decreases as the size of the synthetic dataset increases for the different methods. In particular, the Bayesian approach outperforms the other two for small dataset size. Bootstrap and Empirical estimation performs similarly, with bootstrap being better for larger dataset sizes.

Figure 3. Comparison of Mean Squared Error of the estimator on synthetic datasets of increasing size. MSE of the estimators has been estimated by generating 1,000 different datasets with same base rates.

Estimation of for -differential fairness for both data and model outputs metric on the Adult training set when gender and age are considered as sensitive attributes. The three proposed methods produce similar results, with bootstrap and Bayesian also providing confidence intervals. The model exhibits high for FPR parity, around .

Figure 4. Estimate of -differential fairness for both data and model outputs metrics on the Adult training set when gender and age are considered as sensitive attributes. Vertical bars represent 95% confidence intervals.
No fairness constraints With fairness constraint
binary predictor
score model
TPR 0.5450 0.5481 0.5131 0.5789 0.5145 0.5699
FPR 0.0422 0.0427 0.0497 0.0591 0.0470 0.0551
Expected loss function 0.1416 0.1413 0.1550 0.1463 0.1526 0.1454
Table 3. Predictive performance of given binary predictor and post-processed models on the Adult training set with gender and age as sensitive attributes.

6.2. Adult Income Prediction

6.2.1. Dataset and Model

We consider a practical application of auditing and mitigating bias using the 1994 U.S. census Adult dataset from the UCI repository (Dua and Graff, 2017). The aim is to predict whether an individual’s income is greater than $50,000, using socio-demographic attributes. The data has already been split by the provider into a training set, consisting of 32,561 observations, and a test set, with 16,281 data points.

In the following we will focus on three sensitive attributes: age, gender, and race. We represent age as a binary categorical variable indicating which individuals are over 50. Gender is considered as a binary attribute in the Adult dataset. Race is encoded in the dataset into 5 different categories. For the purpose of this experiment, since the dataset contains few instances of categories “Eskimos and American Indians” and “Other”, we encode them together under the label “Other”.

We build a classifier returning scores in and we apply a fixed threshold to obtain binary predictions, as is commonly done in practice. To allow for a simpler exposition, we first look into applying our proposed methodologies where only two binary sensitive attributes are considered. We then explore a more complex scenario with more than 2 sensitive attributes. Additional tables, figures and implementation details are reported in the supplementary material.

6.2.2. Two sensitive attributes

We treat age and gender as sensitive attributes. First, we look into auditing intersectional fairness on the dataset and on the model outputs. We then compare performances of all the different post-processing techniques.

Auditing intersectional fairness

Figure 4 shows the minimum values of such that -differential fairness is satisfied for different intersectional metrics, both on the data and on the outputs of the binary classifier. We compare the methods introduced in Section 3, considering smoothing parameters

to avoid division by zero and prior parameters for the Beta distribution both equal to

. We note that the three methodologies produce similar answers and that the model may be deemed unfair as the different intersections are subject to varying performance in correctly predicting negative outcomes (i.e., income less than $50,000). It is of interest to check which subgroup of the population (as defined by the intersection of sensitive attributes) drives the value of for FPR parity. Further inspection reveals that the model is times better at correctly predicting negative outcomes for men of age than women of age .

Post-processing for intersectional fairness

Estimate of -differential fairness for equalized odds across the different models and using different estimation techniques. Results are based on the Adult training set when gender and age are considered as sensitive attributes. All the post-processing techniques achieve the desired fairness constraint .

Figure 5. Estimate of -differential fairness for equalized odds across the original and the post-processed models. Results are based on the Adult training set when gender and age are considered as sensitive attributes. The constraint is set at .

We now aim to mitigate this detected discriminatory bias. We first consider the scenario where we only have access to binary predictions. As we assume no further knowledge of the underlying model, the only possible choice is to use randomization as a post-processing technique (cf. Section 4.2). We set a loss function that gives equal weights to false positive and false negative predictions; i.e., . For instance, we decide to improve fairness by a multiplicative factor of 4; i.e., reaching an -differential fairness for FPR parity equal to . As we do not want to deteriorate the performance in terms of TPR parity, we impose as a constraint to have -differential fairness for equalized odds to be less than . The calculated optimal probabilities of changing the predictions are provided in the supplementary material, but in particular we notice that we should flip positive predictions for men of age approximately 25% of the time.

No fairness constraints With fairness constraint
binary predictor
score model
TPR 0.5450 0.5481 0.5434 0.5995 0.5465 0.5376
FPR 0.0422 0.0427 0.0426 0.0759 0.0425 0.0400
Expected loss function 0.1416 0.1412 0.1423 0.1540 0.1415 0.1417
Table 4. Predictive performance of given binary predictor and post-processed models on the Adult training set with gender, age, and race as sensitive attributes.

We now focus on constructing a post-processed model when scores are available. The RTDP that achieves the best predictive performance can be obtained when no fairness constraints are imposed (cf. Section 4.3.1). This model represents a baseline for comparison with the other post-processed models, as it allows us to check whether imposing fairness constraints deteriorates predictive performance excessively. We refer to it as the “optimal score model”.

We now aim to achieve the same value of -differential fairness for equalized odds as before, that is , having access to the scores as well. We construct the following three post-processed models:

  • “Deterministic post-processing”, where we optimize the thresholds only,

  • “Sequential post-processing”, where we consider the optimal score model and apply randomization on top of it,

  • “Overall post-processing”, where we optimize both the thresholds and probabilities simultaneously.

Figure 5 shows the value of -differential fairness for equalized odds achieved by the different post-processing techniques. We notice that, as expected, all of them reach the desired fairness constraint. Their predictive performances are compared in Table 3. When only randomization is used, the post-processed model performs significantly worse than the given binary predictor in terms of expected loss value. On the other hand, the “deterministic” and “overall” post-processed models perform almost as well as the optimal model. When comparing the “sequential” and “overall” post-processed models, we observe that the former performs significantly worse than the latter.

Whilst we chose to improve fairness by a factor of 4, in general the constraint may be set by the user according to their needs or any regulatory or other requirements.

6.2.3. More than two sensitive attributes

Estimation of for -differential fairness for both data and model outputs metric on the Adult training set when gender, age, and race are considered as sensitive attributes. The three proposed methods produce similar results, with bootstrap and Bayesian also providing confidence intervals. The model exhibits high for FPR parity, around .

Figure 6. Estimate of -differential fairness for both data and model outputs metrics on the Adult training set when gender, age, and race are considered as sensitive attributes. Vertical bars represent 95% confidence intervals.

We repeat the same experiment by considering race as an additional sensitive attribute and using the same classifier.

Auditing intersectional fairness

Figure 6 reports the value of -differential fairness for the different metrics. Results are drastically different than the example where only age and gender were considered as sensitive attributes. We observe that the model is now even more unfair across all the different metrics, with -differential fairness for FPR parity being the worst ().

Estimate of -differential fairness for equalized odds across the different models and using different estimation techniques. Results are based on the Adult training set when gender, age, and race are considered as sensitive attributes. All the post-processing techniques achieve the desired fairness constraint .

Figure 7. Estimate of -differential fairness for equalized odds across the original and the post-processed models. Results are based on the Adult training set when gender, age, and race are considered as sensitive attributes. The constraint is set at .

Post-processing for intersectional fairness

We focus on improving the equalized odds intersectional fairness metric. We proceed as above, first constructing the “optimal score model” and then building 4 different post-processing models. The first one is built on top of the binary predictor using randomization only. The other three rely on having access to the scores. We use the same loss function as before, now choosing as constraint . This constraint can be interpreted as reducing bias amplification by a multiplicative factor of 400.

The achieved fairness metrics for the different models are reported in Figure 7. We note that all the post-processed models achieve the desired fairness constraint according to the smoothed empirical estimator. The required value is also contained in the 95% confidence intervals produced by either the bootstrap or the Bayesian estimation procedure.

Table 4 reports models’ predictive performances. Note that there is almost no loss in predictive performance when only randomization is used on top of the given binary predictor. However, further inspection of the post-processed probabilities of flipping the predictions reveals that one should always change positive predictions for the intersection into negative ones. Indeed, this is due to the fact that the model produces wrong predictions for this intersection more often than correct ones. This leads to a more general observation that the post-processed model represent also a valuable tool for assessing the quality of the given predictor.

Clearly, the “optimal score model” performs better in terms of predictive performance, but does not reach the desired fairness constraint. On the other hand, the deterministic post-processing model reaches the fairness constraint but the expected loss is significantly greater than for the other models. Finally, we observe that “sequential” and “overall” post-processing models perform very similarly and close to the “optimal score model”.

We leave as future work a more theoretically grounded analysis of the difference in credible intervals produced by the bootstrap and Bayesian procedures.

7. Conclusion and Future Work

We presented novel methods to assess and achieve intersectional fairness, where multiple sensitive attributes are considered jointly. We proposed different metrics to asses intersectional fairness of both the data and the model outputs. We outlined three different methods to robustly estimate these metrics: smoothed empirical, bootstrap, and Bayesian estimation. The last two methods allow us to asses confidence in the estimates, including rapidly evaluating which subgroups are misrepresented in the data or particularly discriminated by the model. Furthermore, we established post-processing techniques to transform the output of any given binary classifier so as to achieve better fairness with respect to the chosen intersectional fairness metric. Our methodology is particularly appealing in that it allows a practitioner to choose whether random flipping of a model prediction is desirable or not. We implemented the proposed auditing and post-processing methods on the Adult dataset.

Intersectional fairness is crucial for safe deployment of modern machine learning systems, yet most of the algorithmic fairness literature has thus far focused on fairness with respect to an individual sensitive attribute only. Our framework addresses several challenges related to auditing and achieving intersectional fairness. There are many remaining open problems that we hope future work will address, including defining other intersectional fairness metrics (e.g., for calibration) and further refining estimation procedures of fairness metrics, for instance by weighting the bootstrap sample or by differently tuning the prior parameters of the Bayesian procedure. Although we focused on post-processing, research on pre- and in-processing techniques that achieve intersectional fairness can also be carried out. Another future work avenue would be to extend our proposed post-processing methodology to regression and categorical classification problems.


We thank Imran Ahmed, Anil Choudhary, and Stavros Tsadelis for helpful comments and discussions. We would also like to thank anonymous referees for their valuable feedback, which helped us to improve the paper.


  • A. Agarwal, A. Beygelzimer, M. Dudik, J. Langford, and H. Wallach (2018) A reductions approach to fair classification. In FATML’17, Cited by: §2.
  • C. A. Ameh and N. Van Den Broek (2008) Increased risk of maternal death among ethnic minority women in the UK. The Obstetrician & Gynaecologist 10 (3), pp. 177–182. Cited by: §1.
  • J. Angwin, J. Larson, S. Mattu, and L. Kirchner (2016) Machine bias: there’s software used across the country to predict future criminals. and it’s biased against blacks.. Note: Cited by: §4.3.1.
  • J. Buolamwini and T. Gebru (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, S. A. Friedler and C. Wilson (Eds.), Proceedings of Machine Learning Research, Vol. 81, New York, NY, USA, pp. 77–91. External Links: Link Cited by: §1, §1.
  • R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu (1995) A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16 (5), pp. 1190–1208. External Links: ISSN 1064-8275, Link, Document Cited by: 3rd item.
  • Á. A. Cabrera, W. Epperson, F. Hohman, M. Kahng, J. Morgenstern, and D. H. Chau (2019) FairVis: visual analytics for discovering intersectional bias in machine learning. arXiv preprint arXiv:1904.05419. Cited by: §2.
  • F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. R. Varshney (2017) Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 3992–4001. External Links: Link Cited by: §2.
  • L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi (2019) Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, New York, NY, USA, pp. 319–328. External Links: ISBN 978-1-4503-6125-5, Link, Document Cited by: §2.
  • Y. Chung, T. Kraska, N. Polyzotis, K. Tae, and S. Euijong Whang (2019) Automated data slicing for model validation: a big data - ai integration approach. IEEE Transactions on Knowledge and Data Engineering PP, pp. 1–1. External Links: Document Cited by: §2.
  • S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq (2017) Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, pp. 797–806. External Links: ISBN 978-1-4503-4887-4, Link, Document Cited by: §1, §2, §2, §4.1.
  • E. Creager, D. Madras, J. Jacobsen, M. Weis, K. Swersky, T. Pitassi, and R. Zemel (2019) Flexibly fair representation learning by disentanglement. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 1436–1445. External Links: Link Cited by: §2.
  • D. Dua and C. Graff (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences”. External Links: Link Cited by: §1, §1, §6.2.1.
  • C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel (2012) Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, New York, NY, USA, pp. 214–226. External Links: ISBN 978-1-4503-1115-1, Link, Document Cited by: §2.
  • M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, New York, NY, USA, pp. 259–268. External Links: ISBN 978-1-4503-3664-2, Link, Document Cited by: §2, §4.1.
  • J. Forrest, T. Ralphs, S. Vigerske, LouHafer, B. Kristjansson, jpfasano, EdwinStraver, M. Lubin, H. G. Santos, rlougee, and M. Saltzman (2018) Coin-or/cbc: version 2.9.9. External Links: Document, Link Cited by: 1st item.
  • J. Foulds, R. Islam, K. N. Keya, and S. Pan (2018) An intersectional definition of fairness. External Links: 1807.08362 Cited by: Appendix A, §1, §1, §2, §2, §2, §3.1, §3.1, §3.2.1.
  • M. Hardt, E. Price, ecprice, and N. Srebro (2016)

    Equality of opportunity in supervised learning

    In Advances in Neural Information Processing Systems 29, pp. 3315–3323. External Links: Link Cited by: §1, §2, §2, §4.1, Definition 4.1, §4.
  • T. Head, MechCoder, G. Louppe, Iaroslav Shcherbatyi, Fcharras, Zé Vinícius, Cmmalone, C. Schröder, Nel215, N. Campos, T. Young, S. Cereda, T. Fan, Rene-Rex, Kejia (KJ) Shi, J. Schwabedal, Carlosdanielcsantos, Hvass-Labs, M. Pak, SoManyUsernamesTaken, F. Callaway, L. Estève, L. Besson, M. Cherti, Karlson Pfannschmidt, F. Linzberger, C. Cauet, A. Gut, A. Mueller, and A. Fabisch (2018) Scikit-optimize/scikit-optimize: v0.5.2. Zenodo. External Links: Document, Link Cited by: 3rd item.
  • U. Hebert-Johnson, M. Kim, O. Reingold, and G. Rothblum (2018) Multicalibration: calibration for the (Computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 1939–1948. External Links: Link Cited by: §2.
  • M. J. Kotkin (2008) Diversity and discrimination: a look at complex bias. William and Mary Law Rev. 50, pp. . Cited by: §2.
  • M. Jagielski, M. Kearns, J. Mao, A. Oprea, A. Roth, S. S. -Malvajerdi, and J. Ullman (2019) Differentially private fair learning. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 3000–3008. External Links: Link Cited by: §2.
  • F. Kamiran and T. Calders (2012) Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33 (1), pp. 1–33. External Links: ISSN 0219-1377, Link, Document Cited by: §2, §2.
  • T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma (2012) Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases, P. A. Flach, T. De Bie, and N. Cristianini (Eds.), Berlin, Heidelberg, pp. 35–50. External Links: ISBN 978-3-642-33486-3 Cited by: §2.
  • N. Karmarkar (1984) A new polynomial-time algorithm for linear programming. In

    Proceedings of the sixteenth annual ACM symposium on Theory of computing

    pp. 302–311. Cited by: §4.2.
  • M. Kearns, S. Neel, A. Roth, and Z. S. Wu (2018) Preventing fairness gerrymandering: auditing and learning for subgroup fairness. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 2564–2572. External Links: Link Cited by: §2, §2.
  • M. Kearns, S. Neel, A. Roth, and Z. S. Wu (2019) An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, New York, NY, USA, pp. 100–109. External Links: ISBN 978-1-4503-6125-5, Link, Document Cited by: §2.
  • J. Kerman (2011) Neutral noninformative and informative conjugate beta and gamma prior distributions. Electron. J. Statist. 5, pp. 1450–1470. External Links: Document, Link Cited by: §6.1.
  • M. P. Kim, A. Ghorbani, and J. Zou (2019) Multiaccuracy: black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, New York, NY, USA, pp. 247–254. External Links: ISBN 978-1-4503-6324-2, Link, Document Cited by: §2.
  • J. Kleinberg (2018) Inherent trade-offs in algorithmic fairness. In Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’18, New York, NY, USA, pp. 40–40. External Links: ISBN 978-1-4503-5846-0, Link, Document Cited by: §2.
  • D. Kraft (1988) A software package for sequential quadratic programming. External Links: Link Cited by: 2nd item.
  • H. Lakkaraju, E. Kamar, R. Caruana, and E. Horvitz (2017) Identifying unknown unknowns in the open world: representations and policies for guided exploration. In

    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

    AAAI’17, pp. 2124–2132. External Links: Link Cited by: §2.
  • D. Madras, E. Creager, T. Pitassi, and R. Zemel (2018) Learning adversarially fair and transferable representations. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 3384–3393. External Links: Link Cited by: §2.
  • E. B. Manoukian (1986) Modern concepts and theorems of mathematical statistics. Springer New York. External Links: Document, Link Cited by: Appendix A.
  • A. Narayanan (2018) Translation tutorial: 21 fairness definitions and their politics. In Conference on Fairness, Accountability, and Transparency, Cited by: §2.
  • D. Pedreshi, S. Ruggieri, and F. Turini (2008) Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, New York, NY, USA, pp. 560–568. External Links: ISBN 978-1-60558-193-4, Link, Document Cited by: §2.
  • G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger (2017) On fairness and calibration. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 5684–5693. External Links: ISBN 978-1-5108-6096-4, Link Cited by: §2.
  • E. Raff, J. Sylvester, and S. Mills (2018) Fair forests: regularized tree induction to minimize model bias. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’18, New York, NY, USA, pp. 243–250. External Links: ISBN 978-1-4503-6012-8, Link, Document Cited by: §2.
  • B. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro (2017) Learning non-discriminatory predictors. In Proceedings of the 2017 Conference on Learning Theory, S. Kale and O. Shamir (Eds.), Proceedings of Machine Learning Research, Vol. 65, Amsterdam, Netherlands, pp. 1920–1953. External Links: Link Cited by: §2.
  • R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pp. III–325–III–333. External Links: Link Cited by: §2.
  • B. H. Zhang, B. Lemoine, and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’18, New York, NY, USA, pp. 335–340. External Links: ISBN 978-1-4503-6012-8, Link, Document Cited by: §2.
  • J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K. Chang (2017) Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. arXiv e-prints. External Links: 1707.09457 Cited by: §3.1.
  • I. Zliobaite (2015) A survey on measuring indirect discrimination in machine learning. arXiv e-prints. External Links: 1511.00148 Cited by: §2.

Appendix A Proofs of Section 3

Proof of Theorem 3.1.

Theorem VIII.1 of Foulds et al. (2018) proves the result in the case of -differential fairness for statistical parity. The proof is based on the following reformulation of the original definition (Lemma VIII.1, (Foulds et al., 2018)):

and on proving


An analogous reformulation holds for the definitions of -differential fairness for impact ratio, TPR parity and FPR parity. Therefore, the desired result hold for these metrics by reproducing the proof of Theorem VIII.1 of Foulds et al. (2018).

The definition of -differential fairness for the elift metric can be reformulated as:

and so from Equation 8 it follows

as desired. ∎

Proof of Proposition 3.2.

We prove the result for impact ratio, but similar reasoning can be applied to prove consistency for all the -differential fairness metrics introduced in Tables 1 and 2. Assume we have access to a dataset containing observations; we make the dependency on explicit by using superscript . We shall prove that converges in probability to , as defined in Equation 1.

Recall that we defined as the number of occurrences in the dataset of individuals with attributes and positive outcome, while is the number of individuals with attribute . Define the following estimators of and :

respectively. The two estimators are consistent by the Strong Law of Large Numbers. We can now apply Slutsky’s theorem

(Manoukian, 1986, p. 76) and show:

assuming . Moreover, by applying again Slutsky’s theorem, it follows:

Finally, by the Continuous Mapping Theorem, we conclude that is a consistent estimator of . ∎

Proof of Proposition 3.3.

Notice that the expected value of the posterior distribution is given by Equation 2

, while the variance is

. Therefore, as the posterior distribution converges to a Dirac delta concentrated in . In the proof of Proposition 3.2 we showed that converges in probability to

. It then follows by the Central Limit Theorem that the Monte Carlo procedure yields consistent estimates. ∎

Appendix B Proofs of Section 4

Proof of Proposition 4.3.

Recall we assumed w.l.o.g. that . Then:


as desired. ∎

Proof of Proposition 4.4.


Therefore by Proposition 4.3:

where and