Agnostic data debiasing through a local sanitizer learnt from an adversarial network approach

06/19/2019 ∙ by Ulrich Aïvodji, et al. ∙ UQAM Ecole Polytechnique 0

The widespread use of automated decision processes in many areas of our society raises serious ethical issues concerning the fairness of the process and the possible resulting discriminations. In this work, we propose a novel approach called whose objective is to prevent the possibility of any discrimination i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our sanitization algorithm is partially inspired by the powerful framework of generative adversarial networks (in particuler the Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible and thus preserving the interpretability of the sanitized data. As a consequence, once the sanitizer is trained, it can be applied to new data, such as for instance, locally by an individual on his profile before releasing it. Finally, experiments on a real dataset demonstrate the effectiveness of the proposed approach as well as the achievable trade-off between fairness and utility.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In recent years, the availability and the diversity of large-scale datasets of personal information, the algorithmic advances in machine learning and the increase in computational power has lead to the development of personalized services and prediction systems to such an extent that their use is now ubiquitous in our society. To give a few examples, machine learning-based systems are now used in banking for assessing the risk associated to the loan application 

(Mahmoud et al., 2008), in job application to evaluate automatically the profile of a candidate (Faliagka et al., 2012) and in predictive justice to quantify the risk of recidivism of an inmate (Center, 2016).

Despite their usefulness, the predictions performed by these algorithms are not exempt from biases and numerous cases of discriminatory decisions have been reported over the last years. For instance, going back on the case of predictive justice, a study conducted by ProPublica showed that the recidivism prediction tool COMPAS, which is currently used in Broward County (Florida), is strongly biased against black defendants, by displaying a false positive rate twice as high for black persons than for white persons (Julia Angwin and Kirchner, 2016). If the dataset exhibits strong detectable biases towards a particular sensible group (e.g., an ethnic or minority group), the straightforward solution of removing the attribute that identified the sensitive group would only prevent direct discrimination. Indeed, indirect discrimination can still occur due to correlations between the sensitive attribute and other attributes.

In this article, we propose a novel approach called GANSan (for Generative Adversarial Network Sanitizer) to address the problem of discrimination due to the biased underlying data.

In a nutshell, our approach aims to learn a sanitizer (in our case a neural network) transforming the input data in a way that maximize the following two metrics : (1)

fidelity, in the sense that the transformation should modify the data as little as possible, and (2) non-discrimination, which means that the sensitive attribute should be difficult to predict from the sanitized data.

A possible use case would be the recruitment process of referees for an amateur sport organization. In particular, in this situation process should be based primarily on the merit of applicant but at the same time the institution might be aware that the data used to train a model to automatize the recruitment process might be highly biased according race. In practice, approaches such as the Rooney Rule have been proposed and implemented to foster diversity for the recruitment of the coaches in the national football league as well as in other industries. To address this issue, the institution could use our approach to sanitize the data before applying a merit-based algorithm on the sanitized data to select the referee.

Another typical use case might be one in which a company, during its recruitment phase, offers candidates a tool to remove racial correlation in their personal profile before submitting their sanitized profile to the job application platform. If the tool is built correctly, the company recruitment system is free from racial discrimination as it never had access to the original profile, which is only known by the applicants.

Overall, our contributions can be summarized as follows:

  • We propose a novel approach in which a sanitizer is learned from the original data. The sanitizer can then be applied on a profile in such a way that the sensitive attribute is removed as well as existing correlations with other attributes while ensuring that the sanitized profile is modified as little as possible, preventing both direct and indirect discriminations. Our sanitizer is strongly inspired from Generative Adversarial Networks (GANs) (Goodfellow et al., 2014a), which have been highly successful in terms of applications.

  • Rather than building a fair classifier, our objective is more generic in the sense that we aim at debiasing the data with respect to the sensitive attribute. Thus, one of the main benefits of our approach is that the sanitization can be performed without having any knowledge regarding the task that are going to be conducted in the future on the sanitized data. In addition, as the sensitive attribute can refer to any characteristic of the user, we believe

    GANSan to be applicable to the broader context of data anonymization.

  • Another strength of our approach is that once it has been learned, it can be used directly by an individual to generate a modified version of his profile that still lives in the same representation space but from which it is very difficult to infer the sensitive attribute. In this sense, our method can be considered to fall under the category of randomized response techniques (Warner, 1965) as it can be used locally by a user to sanitize his data and thus does not require his true profile to be sent to a trusted third party. Of all of the approaches that currently exist in the literature to reach algorithmic fairness (Friedler et al., 2018), we are not aware of any other work that has considered the case of local sanitization with the exception of (Romanelli et al., 2019), which focuses on the protection of privacy but could also be applied to enhance fairness.

  • To demonstrate its usefulness, we have evaluated our approach on a real dataset by analyzing the achievable trade-off between fairness and utility measured both in terms of perturbations introduced by the sanitization framework but also with respect to the accuracy of a classifier learned on the sanitized data.

The outline of the paper is as follows. First, in Section 2, we introduce the system model before reviewing the background notions on fairness and GANs. Afterwards, in Section 3, we review the related work on methods for enhancing fairness belonging to the preprocessing approach like our approach before describing our method GANSan in Section 4. Finally, we evaluate experimentally our approach in Section 5 before concluding in Section 6.

2. Preliminaries

In this section, we first present the system model used in this paper before reviewing the background notions on fairness metrics and generative adversarial networks.

2.1. System model

In this paper, we consider the generic setting of a dataset composed of records. Each record typically corresponds to the profile of the individual and is made of attributes, which can be categorical, discrete or continuous. Amongst those, two attributes are considered as being of special. First, the sensitive attribute S (e.g., gender, ethnic origin, religious belief, …) should remain hidden, for instance to prevent discrimination. Second, the decision attribute is typically used for a classification task (e.g., accept or reject an individual for a loan). The other attributes of the profile, which are neither S nor , will be referred hereafter as A.

For simplicity, in this work we restricted ourselves to the situations in which these two attributes are binary (i.e., and ). However, our approach could also be generalized easily to the multivalued attributes, although quantifying fairness in the case of non-binary attributes is much more challenging than for binary ones (Kearns et al., 2017). Our main objective is to prevent the possibility of inferring the sensitive attribute from the sanitized data.

2.2. Fairness metrics

First, we would like to point out that there are many different definitions of fairness existing in the literature (Arvind, 2018; Friedler et al., 2018; Verma and Rubin, 2018; Corbett-Davies et al., 2017; Dwork et al., 2012; Joseph et al., 2016) and that the choice of the appropriate definition is highly dependent of the context considered.

For instance, one natural approach for defining fairness is the concept of individual fairness (Dwork et al., 2012), which states that individuals that are similar except for the sensitive attribute should be treated similarly and thus should receive similar decisions. This notion relates to the legal concept of disparate treatment (Barocas and Selbst, 2016), which occurs if the decision process was made based on sensitive attributes. This definition is relevant when discrimination is due to a prejudice caused by the decision process and therefore cannot be used in the situation in which the objective is to directly redresses biases in the data.

In contrast to individual fairness, group fairness relies on statistic of outcomes of the subgroups indexed by S and can be quantified in several ways, such as demographic parity (Berk et al., [n. d.]) and

equalized odds

 (Hardt et al., 2016). More precisely, the demographic parity corresponds to the absolute difference of rates of positive outcomes in the sensitive and default groups (for which respectively and ):


while equalized odds is the absolute difference of odds in each subgroup :


Equalized odds (Hardt et al., 2016) requires the equality of true and false positives: and is more suitable to use than demographic parity in the situation in which of the base rates in both groups differ (). Note that these definitions are agnostic to the cause of the discrimination and are based solely on the assumption that statistics of outcomes should be similar for each subgroup.

In our work, we follow a different line of research by defining fairness in terms of the inability to infer S from other attributes (Feldman et al., 2015; Xu et al., 2018). This approach stems from the fact that it is impossible to discriminate based on the sensitive attribute if the latter cannot be predicted. Thus, our approach aims at altering the data in such a way that no classifier should be able to infer the sensitive attribute from the sanitized data. The inability to infer the attribute S is measured mainly using the accuracy of a predictor Adv trained to recover the hidden S (sAcc), as well as the balanced error rate (BER) introduced in (Feldman et al., 2015):


The BER captures the predictability of both classes and a value of can be considered optimal for protecting against inference in the sense that it means that the prediction of the predictor is not better than a random guess. In addition, the BER is more relevant than using directly the accuracy of a classifier at predicting the sensitive attribute for datasets in which the sensitive and default groups are unbalanced. To summarize, the objective of a successfull sanitization is to cause the sensitive accuracy to drop significantly while raising the BER close to its optimal value of .

2.3. Generative adversarial network

Generative Adversarial Network (GAN) is a relatively novel approach from machine learning introduced to solve the difficult problem of modelling and learning high dimensional distributions (e.g., pictures). Typically, in a GAN two neural networks compete against each other in a zero-sum game framework (Goodfellow et al., 2014b). The first neural network is called the generator and its aim is to learn to produce from noise data close enough to a given distribution , its production being the final objective of the GAN. The second network is called the discriminator and its presence is motivated by the difficulty of assessing objectively the quality of the generator. Its task is to discriminate whether the given sample originates from the generator or from the training data.

Despite its intuitive aspect and the fact that GANs are powerful tools for modelling distributions, the training of a GAN can be difficult and often require important engineering efforts for ensuring its success (Zhu et al., 2017). For instance during the training phase, if the discriminator or the generator becomes quickly more accurate than its counterpart, the other one will not be able to catch-up and improve its performance.

Mirza and Osindero (Mirza and Osindero, 2014) propose an extension of GAN, called CycleGAN, in which the goal is to learn to translate between two similar distributions. Our approach, GANSan , is inspired by the framework of GANs and CycleGANs in the sense that our objective is to learn to remove the dependency between the protected attribute and the other attributes without having an explicit description of these dependencies and by solely relying on the ability of an adversary to distinguish (or not) between the sensitive and default groups.

3. Related work

In recent years, many approaches have been developed to enhance the fairness of machine learning algorithms. Most of these techniques can be classified into three families of approaches: namely (1) the preprocessing approach (Edwards and Storkey, 2015; Feldman et al., 2015; Louizos et al., 2015; Zemel et al., 2013) in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), (2) the algorithmic modification approach (also sometimes called constrained optimization) in which the learning algorithm is adapted to ensure that it is fair by design (Zafar et al., 2017; Kamishima et al., 2012) and (3) the postprocessing approach that modifies the output of the learning algorithm to increase the level of fairness (Kamiran et al., 2010; Hardt et al., 2016). We refer the interested reader to (Friedler et al., 2018) for a recent survey comparing the different fairness enhancing methods. Due to the limited space and as our approach falls within the preprocessing approach, we will review afterwards only the main methods of this category.

Among the seminal work in fairness enhancement in (Feldman et al., 2015), the authors have developed a framework that consists in translating conditional distributions of each of the datasets’ attributes by shifting them towards a median distribution. While this approach is straightforward, it does not take into account unordered categorical attributes as well as correlations that might arise due to a combination of attributes, which we address in this work. Zemel and co-authors (Zemel et al., 2013) have proposed to learn a fair representation of data based on a set of prototypes, which preserves the outcome prediction accuracy and allows an accurate reconstruction of the original profiles. Each prototype can equally identify groups based on the sensitive attribute values. This technique has been one of the pioneer work in mitigating fairness by changing the representation space of the data. However, for this approach to work the definition of the set of prototypes is highly critical. In the same direction, the authors in (Calmon et al., 2017) have learned an optimal randomized mapping for removing group-based discrimination while limiting the distortion introduced at the profile and the distribution levels to preserve utility. Similarly, Louizos and co-authors (Louizos et al., 2015) used a variational auto-encoder (Kingma and Welling, 2013) to enhance fairness by choosing a prior distribution independently of the group membership and removing differences across groups with the maximum mean discrepancy (Gretton et al., 2007).

In addition, several approaches have been explored to enhance fairness based on adversarial learning. For instance, Edwards and Storkey (Edwards and Storkey, 2015) have trained an encoder to output a representation from which an adversary is unable to accurately predict the group membership, but from which a decoder is able to reconstruct the data and on which a decision predictor still performs well. Madras, Creager, Pitassi and Zemel (Madras et al., 2018) extended this framework to satisfy the equality of opportunities (Hardt et al., 2016) constraint and explored the theoretical guarantees for fairness provided by the learned representation as well as the ability of the representation to be used for a different classification tasks. Beutel, Chen, Zhao and Chi (Beutel et al., 2017) have studied how the choice of data affects the fairness in the context of adversarial learning. One of the interesting result of their study is the relationship between statistical parity and the removal of the sensitive attribute, which demonstrates that learning a representation independent of the sensitive attribute with a balanced dataset ensures statistical parity. Zhang, Lemoine and Mitchell (Zhang et al., 2018) have designed a decision predictor satisfying group fairness by ensuring that an adversary is unable to infer the sensitive attribute from the predicted outcome. Afterwards, Wadsworth, Vera and Piech (Wadsworth et al., 2018) have applied the latter framework in the context of recidivism prediction, demonstrating that it is possible to significantly reduce the discrimination while maintaining nearly the same accuracy as on the original data. Finally, Sattigeri and co-authors (Sattigeri et al., 2018)

have developed a method to cancel out bias in high dimensional data, such as multimedia data, using adversarial learning.

While these approaches are effective at addressing fairness, one of the common drawback of these methods is that they do not preserve the interpretability of the data. One notable exception is the method proposed by Xu, Yuan, Zhang and Wu (Xu et al., 2018) called FairGan, which is the closest to ours, even though their objective is to learn a fair classifier on a dataset that has been generated such that it is discrimination-free and whose distribution on attributes is close to the original one. Our approach further diverges from this work by the fact that their approach is a direct application of the original GAN framework coupled with a second adversary (whose task is to reconstruct the sensitive attribute from samples that successfully fooled the first discriminator), while ours can be rightfully compared to an auto-encoder coupled with the same adversary. Inspired from (Tripathy et al., 2017), Romanelli, Palamidessi and Chatzikokolakis (Romanelli et al., 2019) have developed a method for learning an optimal privacy protection mechanism also inspired from GAN, which they have applied to location privacy. Here, the objective is to minimize the amount of information (measured by the mutual information) preserved between the sensitive attribute and the prediction of this attribute made by a classifier, while respecting a bound on the utility of the dataset. This approach differs from ours in several points. In particular, their focus is on protecting location privacy while ours is on enhancing fairness. In addition, they put a bound on the utility while we impose a bound on fairness.

Drawing from existing works in the privacy field, Ruggieri (Ruggieri, 2014) showed that the -closeness anonymization technique (Ninghui et al., 2007) can be used as a preprocessing approach to control discrimination as there is a close relationship between -closeness and group fairness. In addition, local sanitization approaches (also called randomized response techniques) has been investigated for the protection of privacy. More precisely, one of the benefit of local sanitization is that there is no need to centralize the data before sanitizing it, thus limiting the trust assumptions that an individual has to make on external entities when sharing his data. For instance, Wang, Hu and Wu (Wang et al., 2016) have applied randomized response techniques achieving differential privacy during the data collection phase to avoid the need to have an untrusted party collecting sensitive information. Similarly to our approach, the protection of information takes place at the individual level as the user can randomize his data before publishing it. The main objective is to produce a sanitized dataset in which global statistical properties are preserved but from it is not possible to infer the sensitive information of a specific user. In the same direction, Du and Zhan (Du and Zhan, 2003)

have proposed a method for learning a decision tree classifier on this sanitized data. However, none of this previous works have taken into account the fairness aspect. Thus, while our method falls also within the local sanitization approaches in the sense that the sanitizer can be apply locally by a user, our initial objective is quite different as we aim at preventing the risk of discrimination. Nonetheless, at the same time our method also protects against attribute inference with respect to the protected attribute.

4. Local sanitization for data debiasing

As explained previously, simply removing the sensitive attribute from the data is rarely sufficient to guarantee fairness as correlations are likely to exist between other attributes and the sensitive one. Those correlations could be straightforward like attributes including direct information on the sensitive one but can very well be more complex such as a non-linear combination of several attributes. In general, detecting complex correlations between attributes as well as suppressing them is a difficult task.

To address this issue, our approach GANSan relies on the modelling power of GANs to build a sanitizer that can cancel out correlations with S without having an explicit model of those correlations. In particular, it exploits the capacity of the discriminator to distinguish the subgroups indexed by the sensitive attribute. Once the sanitizer has been trained, any individual can locally apply it on his profile before disclosing it to ensure that the sensitive information is hidden. The sanitized data can then be safely used for any subsequent task.

4.1. Generative adversarial network sanitization

High level overview.

Formally, given a dataset , the objective of GANSan is to learn a function , called the sanitizer that perturbs individual profiles of the dataset , such that a distance measure called the fidelity (in our case we will use the norm) between the original and the sanitized datasets (), is minimal, while ensuring that S cannot be recovered from . Our approach differs from classical conditional GAN (Mirza and Osindero, 2014) by the fact that the objective of our discriminator is to reconstruct the hidden sensitive attribute from the generator output, whereas the discriminator in classical conditional GAN has to discriminate between the generator output and samples from the true distribution.

The high-level overview of the training of GANSan is as follows:

  • The first step corresponds to the training of the sanitizer (Algorithm1: lines ). Basically, the sanitizer can be seen as the generator as in standard GAN but with a different purpose. In a nutshell, it learns the empirical distribution of the sensitive attribute and generate a new distribution that concurrently respect two objectives: (1) finding a perturbation that will fool the discriminator in predicting S while (2) minimizing the damage introduced by the sanitization. More precisely, the sanitizer takes as input the original dataset (including S and Y) plus some noise . The noise is used to avoid the over-specialization of the sanitizer on the training set while making the reverse mapping of sanitized profiles to their original versions more difficult.

  • The second step consists in training the discriminator for predicting the sensitive attribute from the data produced by the sanitizer (Algorithm1:lines ). The rationale of our approach is that the better the discriminator is at predicting the sensitive attribute S, the worse the sanitizer is at hiding it and thus the higher the potential risk of discrimination.

These two steps are used in an iterative manner until convergence of the training. Figure 1 present the high level overview of the training procedure while Algorithm 1 describes it in details.

: S Original.

Original data: (Starting point)

Sanitizer (Generator )

Sanitized data ()


: S predicted.





Figure 1. Sanitization framework. The objective of the discriminator is to predict S  from the output of the sanitizer . The two objective functions that the framework aims at minimizing are respectively the discriminator and sanitizer losses, namely and .
1:Inputs: , , , ,
2:Output: ,
3: Initialization
4:, ,
6:for e  do
7:     for i  do
8:         Sample batch B of size from
9:         : extract S column from

Compute the reconstruction loss vector

14: compute the sensitive loss
16: concatenate the previously computed loss
18:         for   do
19: Back-propagation using


21:              Update weights
22:         end for
23:         for l  do
24:              Sample batch of size from
25:              : extract S column from
26:               = BER(, )
27:              Backpropagate
28:              Update weights
29:         end for
30:     end for
31:     Save and states
32:end for
Algorithm 1 GANSan Training Procedure

Training GANSan .

Let be the prediction of S by the discriminator (). Its goal is to accurately predict S, thus it aims at minimizing the loss . In practice in our work, we instantiate as the Mean Squared Error (MSE).

Given an hyperparameter

representing the desired trade-off between the fairness and the fidelity, the sanitizer minimizes a loss combining two objectives:


in which is on the sensitive attribute. The term is due to the objective of maximizing the error of the discriminator (i.e., the optimal value of the BER is ).

With respect to the reconstruction loss

, we have first tried the classical Mean Absolute Error (MAE) and MSE losses. However, our initial experiments have shown that these losses produce datasets that are highly problematic in the sense that the sanitizer always outputs the same profile whatever the input profile, thus making it unusable. Therefore, we had to design a slightly more complex loss function. More precisely, we chose not to merge the respective losses of these attributes (

), yielding a vector of attribute losses whose components are iteratively used in the gradient descent. Hence, each node of the output layer of the generator is optimized to reconstruct a single attribute from the representation obtained from the intermediate layers. The vector formulation of the loss is as follows: and the objective is to minimize all its components. We are planning to conduct a deeper analysis of the vector formulation as well as its interactions with differents optimization techniques used in a future work. The details of the parameters used for the training are detailed in Appendices A and B.

4.2. Performance metrics

The performance of GANSan will be evaluated by taking into account the fairness enhancement and the fidelity to the original data. With respect to fairness, we will quantify it primarily with the inability of a predictor , hereafter referred to as the adversary, in inferring the sensitive attribute (cf. Section 2) using primarily its Balanced Error Rate (BER) (Feldman et al., 2015) and its accuracy (sAcc) (cf., Section 2.2). We will also assess the fairness using metrics (cf. Section 2) such as the demographic parity (Equation 1) and the equalized odds (Equation 2).

To measure the fidelity between the original and the sanitized data, we have to rely on a notion of distance. More precisely, our approach does not require any specific assumption on the distance used, although it is conceivable it would work better with some than others. For the rest of this work we will instantiate by the -norm as it does not differentiate between attributes.

A high fidelity is not a sufficient condition to imply a good reconstruction of the dataset, as early experiments showed that the sanitizer might find a “median” profile to which it will map all input profiles. Thus, in addition to quantify the ability of the sanitizer to preserve the diversity of the dataset, we introduce the diversity measure, which is defined in the following way :


While quantifies how different the original and the sanitized datasets are, the diversity measures how diverse the profiles are in each datasets. We will also quantitatively discuss the amount of damage for a given fidelity and fairness to give a better understanding of the qualitative meaning of the fidelity.

Finally, we also evaluate the loss of utility induced by the sanitization by relying on the the accuracy of prediction on a classification task. More precisely, the difference in between a classifier trained on the original data and one trained on the sanitized data can be used as a measure of the loss of utility introduced by the sanitization with respect to the classification task.

5. Experimental evaluation

In this section, we describe the experimental setting used to evaluate GANSan  as well as the results obtained.

5.1. Experimental setting

Datasets descriptions.

We have evaluated our approach on Adult Census Income, available at the UCI repository111 Adult Census reports the financial situation of individuals, with 45222 records after the removal of rows with empty values. Each record is characterized by 15 attributes among which we chose, the gender (i.e., male or female) has the sensitive one and the income level (i.e., over or below 50K$) has the decision.

Dataset Adult Census
Group Sensitive (, Female) Default (, Male)
Table 1. Distribution of the different groups with respect to the sensitive attribute and the decision one in Adult Census Income.

We will evaluate GANSan using metrics such as the fidelity , the BER and the demographic parity . For this, we have conducted a 10-fold cross validation. More precisely, the dataset is divided into 10 blocks such that during each fold 8 blocks are used for the training, while one of the block is retained as the validation set and the last one has the test set. In the following, we report on the results obtained for 7 folds. We computed the and using the discriminator of GANSan  and three external classifiers (Support Vector Machines (SVM) (Cortes and Vapnik, 1995), Multilayer Perceptron (MLP) (Popescu et al., 2009) and Gradient Boosting (GB) (Friedman, 2002)), independent of the GANSan 

framework. For all these external classifiers and all epochs, we report the space of achievable points.

For each fold and for each value of , we train the sanitizer for epochs. At the end of each epoch, we save the sanitizer state and generate a sanitized dataset on which we compute the , and . With this, selects the sanitized dataset that is closest to the optimal point ().

More precisely, is defined as follows: with and referring to the minimum value of obtained with the external classifiers. selects among the sanitizers saved at the end of each epoch, the ones achieving the highest fairness in terms of for the lowest damage, for each value of the hyper-parameter .

We will use the same families of external classifiers for computing the metrics , and . We used the same chosen test set to conduct the detailed analysis of its reconstruction’s quality ( and quantitative damage on attributes).

5.2. Evaluation scenarios

Recall, that GANSan takes as input the whole sanitized dataset (including the sensitive and the decision attributes) and outputs a sanitized dataset (without the sensitive attribute) in the same space as the original one, but from which it is impossible to infer the sensitive attribute. In this context, the overall performance of GANSan can be evaluated by analyzing the reachable space of points characterising the trade-off between the fidelity to the original dataset and the fairness enhancement. More precisely, during our experimental evaluation we will measure the fidelity between the original and the sanitized data, as well as the , both in relation with the and , computed on this dataset.

However, in practice the sanitized dataset can be used in several situations. In the following, we detail four scenarios that we believe as representing most of the possible use cases of GANSan . To ease the understanding, we will use the following notation: the subscript (respectively ) will denote the data in the training set (respectively test set). For instance, in which can either be , , or , represents respectively the attributes of the original training set (not including the sensitive and the decision attributes), the decision in the original training set, the attributes the sanitized training set and the decision attribute in the sanitized training set. Table 2 summarizes the notation used while in Table 3 we describes the composition of the training and the testings sets for these four scenarios.

Original Sanitized
original decision sanitized decision.
original decision in the training set sanitized decision in the training set.
original decision in the test set sanitized decision in the test set.
original attributes (not including the sensitive and the decision attributes). sanitized attributes (not including the sensitive and the decision attributes).
original attributes in the training set. sanitized attributes in the training set.
original attributes in the test set. sanitized attributes in the test set.
Table 2. Notations used to differentiate evaluation scenarios

In details, the scenarios that we considered for our evaluation are the following.

Scenario 1 : complete data debiasing.

This procedure correspond to a typical use of the sanitized dataset, which is the prediction of a decision attribute through a classifier. The decision attribute is also sanitized as we assumed that the original decision holds information about sensitive attribute. Here, we quantify the accuracy of prediction of as well as the discrimination represented by the demographic parity gap (Equation 1) and the equalized odds gap (Equation 2) defined in Section2.

Scenario 2 : partial data debiasing.

In this scenario, just like the previous one, the training and the test sets are sanitized with the exception that the sanitized decision in both these datasets is replaced with the original one . This scenario is generally the one considered in the majority of paper on fairness enhancement (Zemel et al., 2013; Edwards and Storkey, 2015; Madras et al., 2018), the accuracy loss in the prediction of the original decision between this classifier and another trained on the original dataset without modifications being a straightforward way to quantify the utility loss due to the sanitization.

Scenario 3 : building a fair classifier.

This scenario was considered in (Xu et al., 2018) and is motivated by the fact that the sanitized dataset might introduced some undesired perturbation (e.g. changing the education level from Bachelor to PhD). Thus, a third party might build a fair classifier but still apply it directly on the unperturbed data to avoid the data sanitization process and the risks associated. More precisely in this scenario, a fair classifier is obtained by training it on the sanitized dataset to predict the sanitized decision . Afterwards, this classifier is tested on the original data () by measuring its fairness is measured through the demographic parity (Equation 1, Section 2). We also compute the accuracy of the fair classifier with respect to the original decision of the test set .

Scenario 4 : local sanitization.

The local sanitization scenario corresponds to a private use of the sanitizer. For instance, the sanitizer could be used as part of a mobile phone application providing individuals with a tool to remove some sensitive attributes from their profile before disclosing it to an external entity. In this scenario, we assume the existence of a biased classifier, trained to predict the original decision on the original dataset . The user has no control on this classifier but he is allowed nonetheless to perform the sanitization locally on his profile before submitting it to the existing classifier. This classifier is applied on the sanitized test set and its accuracy is measured with respect to the original decision as well as the fairness enhancement quantified by the DemoParity.

Scenario Train set composition Test set composition
Baseline Original Original Original Original
Scenario 1 Sanitized Sanitized Sanitized Sanitized
Scenario 2 Sanitized Original Sanitized Original
Scenario 3 Sanitized Sanitized Original Original
Scenario 4 Original Original Sanitized Original
Table 3. Scenarios envisioned for the evaluation of GANSan . Each set is composed of either the original attributes (not taking into account the sensitive or decision attributes) or their sanitized versions, coupled with either the original decision or its sanitized counterpart.

All scenarios require the use of a sanitized version of the dataset (either or or both) for either training the model or computing results (the decision accuracy , and ). We use to select the version of the dataset to use. In fact for each value of , we generate a new version of the sanitized dataset at the end of each epoch.

General results.

Figure 3 describes the achievable trade-off between fairness and fidelity obtained using the sanitizer. Note that even when (i.e., all the weight is put on utility), we cannot reach a perfect fidelity to the original data as we get at most (cf. Figure 3). However, it can be seen that the fairness improves with the increase of the value of the as expected. A low value of such as 0.2 provides a fidelity close to the highest possible (), but leads to a BER that is not higher than . Note that this still improves the fairness compares to the original data (, ).

In the other direction, setting the value of the coefficient to a high value such as allows the sanitizer to completely remove the unwarranted correlations () with a cost on fidelity (). At the extreme setting in which , the data is sanitized without putting any emphasis on the fidelity. In this case, the is optimal as expected and the fidelity of , lower than the .

Figure 2. Fidelity/fairness trade-off on Adult census income. Each point represents the minimum possible of all the external classifiers. The fairness improves with the increase of , a small value providing a low fairness guarantee while a high one introduce a greater damage on the sanitize data. Remark that even with , a small damage is to be expected. Points whose (lower right) represent the on the original (i.e., unperturbed) dataset. The black triangle on the upper right correspond to the optimal value that one can hope to achieve.

Concerning the , we observe the same behaviour as the . More precisely, the accuracy drops significantly when the value of increases. Here, the optimal value is the proportion of the majority class, which GANSan  renders the accuracy of predicting S from the sanitized set closer to. However, even with , it is impossible to reach this optimal value. Setting will lead to a significantly sanitization of the dataset while preserving a fidelity closer to the maximum achievable possible.

Figure 3. Fidelity-Fairness trade-off on Adult census income. Each point represents the minimum possible of all the external classifiers. The decreases with the increase of , a small value providing a low fairness guarantee while a larger one usually introduced a higher damage.

The quantitative analysis with respect to the impact on diversity is shown in Figure 4. More precisely, the smallest drop of diversity obtained is , which is achieved when we set . Among all valus of , the biggest drop observed is . The application of GANSan  therefore introduces an irreversible perturbation as observed with the fidelity. This loss of diversity implies that the sanitization reinforces the similarity between sanitized profiles as increases, rendering them almost identical or forcing the input profiles to be mapped to a small number of stereotypes.

When is in the range , of categorical attributes have a proportion of modified records between and (cf. Figure 3(a)). For most of the numerical attributes at least of records in the dataset have a relative change lower than , if , . For that same amount of relative change (), we observe that at least of records in the sanitized dataset are covered for lower values of . Selecting leads to of records being modified with a relative change of less than . In particular, the most damaged profiles are presented in Table 4. From this table as well as Table 5, we can observe that the sanitization process transforms all profiles but with different degrees. In Table 6, the same profile (which was the most damaged one in fold 1) was tracked across different folds and we show how the sanitization process affects it. We can see that the modifications applied to the profile accross different folds are not deterministic.

(a) Diversity and damage to categorical attributes.
(b) Relative change to numerical attributes.
Figure 4. (3(a)) Boxplots of the quantitative analysis of sanitized datasets selected using . These metrics are computed on the whole sanitized dataset. Modified records correspond to the proportion of records with categorical attributes affected by the sanitization. For numerical attributes (3(b)), we computed the cumulative distribution of the relative change (x-axis) versus the the proportion of records affected in the dataset (y-axis). Further details about these results are available in Appendix D.
Attrs Original Fold 1
age 42 49.58
workclass State Federal
fnlwgt 218948 192102.77
education Doctorate Bachelors
education-num 16 9.393
marital-status Divorced Married-civ-spouse
occupation Prof-specialty Adm-Clerical
relationship Unmarried Husband
race Black White
hours-per-week 36 47.04
native-country Jamaïca Peru
damage value
Attrs Original Fold 4
age 29 49.01
workclass Self-emp-not-inc Without-pay
fnlwgt 341672 357523.5
education HS-grad Doctorate
education-num 9 7.674
marital-status Married-spouse-absent Married-civ-spouse
occupation Transport-moving Protective-serv
relationship Other-relative Husband
race Asian-Pac-Islander Black
hours-per-week 50 40.37
native-country India Thailand
damage value
Attrs Original Fold 3
age 38 31.65
workclass Federal-gov Self-emp-not-inc
fnlwgt 37683 245776.230
education Prof-school Doctorate
education-num 15 13
marital-status Never-married Married-civ-spouse
occupation Prof-specialty Handlers-cleaners
relationship Not-in-family Husband
race Asian-Pac-Islander White
capital-gain 11.513 0
hours-per-week 57 43.5
native-country Canada Portugal
damage Value
Table 4. Most damaged profiles for on the first three folds. Only the perturbed attributes are shown.
Attrs Original
age 49 49.4
workclass Federal-gov Federal-gov
fnlwgt 157569 193388
education HS-grad HS-grad
education-num 9 9.102
marital-status Married-civ-spouse Married-civ-spouse
occupation Adm-Clerical Adm-Clerical
relationship Husband Husband
race White White
capital-gain 0 0
capital-loss 0 0
hours-per-week 46 44.67
native-country United-States United-States
income 0 0
Attrs Original
age 35 29.768
workclass Private Private
fnlwgt 241998 179164
education HS-grad HS-grad
education-num 9 8.2765
marital-status Never-married Never-married
occupation Sales Farming-fishing
relationship Not-in-Family Not-in-Family
race White White
capital-gain 8.474 0
capital-loss 0 0
hours-per-week 40 42.434
native-country United-States United-States
income 1 0
Attrs Original
age 42 49.58
workclass State Federal
fnlwgt 218948 192102.77
education Doctorate Bachelors
education-num 16 9.393
marital-status Divorced Married-civ-spouse
occupation Prof-specialty Adm-Clerical
relationship Unmarried Husband
race Black White
capital-gain 8.474 0
capital-loss 0 0
hours-per-week 36 47.04
native-country Jamaïca Peru
income 0 0
Table 5. Minimally damaged profile, profile with damage at of the max and most damaged profile for for the first fold.
Attrs Original Fold 1 Fold 4 Fold 3
age 42 49.58 50.5 32.17
workclass State Federal Self-emp-not-inc Self-emp-not-inc
fnlwgt 218948 192102.77 214678 250047
education Doctorate Bachelors HS-grad Doctorate
education-num 16 9.393 10.3191 10.89
marital-status Divorced Married-civ-spouse Married-civ-spouse Married-civ-spouse
occupation Prof-specialty Adm-Clerical Adm-clerical Transport-moving
relationship Unmarried Husband Husband Husband
race Black White White White
Capital Gain 0 0 0 0
Capital Loss 0 0 0 0
hours-per-week 36 47.04 38.7 40.50
native-country Jamaïca Peru United-States United-States
Income 0 0 0 0
Damage Value
Table 6. Most damaged profile for for the first fold and the same profile obtained at the end of the sanitization for the fold 4 and fold 3.
Figure 5. Accuracy (blue), demographic parity gap (orange) and equalized odds gap (true positive rate in green and false positive rate in red) computed for scenarios 1, 2, 3 and 4 (top to bottom), with the classifiers GB, MLP and SVM (left to right).

Scenario 1 : complete data debiasing.

In this scenario, we observe that GANSan  preserves the accuracy of the dataset. More precisely, it increases the accuracy of the decision prediction on the sanitized dataset for all classifiers (cf. Figure 5, Scenario S1), compared to the original one which is , and respectively for GB, MLP and SVM. This increase can be explained by the fact that GANSan  modifies profiles to make them more coherent with the associated decision, by removing correlations between the sensitive attribute and the decision one. As a consequence, this sets the same decision to similar profiles in both the protected and the default groups. As a matter of fact, nearly the same distributions of decision attribute are observed before and after the sanitization, but some records with either positive or negative decisions are shuffled (around of records remain unchanged, at ). We also believe that the increase of accuracy is correlated with drop of diversity. More precisely, if profiles are closer to each other, the decision boundary might be easier to find. We present in Table 7 the shift of decision proportion across the different folds, at . We observe that in some cases, the sanitizer transforms the binary decision column almost into a single-valued one. We leave the study of how GANSan  affects the decision boundary as future work.

Original Max Min Mean Std
Table 7. Proportion of the positive decision attribute across the different folds, at .

The discrimination is reduced as observed through , and , which all exhibit a negative slope as we expected. When the correlations with the sensitive attribute are significantly removed (), those metrics also significantly decrease. For instance, at , , , , for GB; whereas as the original demographic parity gap and equalised odds gap are respectively , . See Tables 10 and 12 for more detailed results.

In this setup, FairGan (Xu et al., 2018) achieve a BER of an accuracy of and a demographic parity of .

Scenario 2 : partial data debiasing.

Unexpectedly, we observe an increase in accuracy for most values of alpha. The demographic parity gap also decreases while the equalized odds remain nearly constant (, green line on Figure 5). Table 8 compare the results obtained to other existing work from the state-of-the-art. We include the classifier with the highest accuracy (MLP) and the one with the lowest one (SVM). From these results, we can observe that our method outperforms the others in terms of accuracy, but the demographic parity is best achieved with the work done in (Zhang et al., 2018) (), which is not surprising as this method is precisely tailored to reduce this metric. Even though our method is not specifically constrained to mitigate the demographic parity, we can observe that it significantly enhance it.

Authors yAcc DemoParity
LFR  (Zemel et al., 2013)
ALFR  (Edwards and Storkey, 2015)
MUBAL (Zhang et al., 2018) 0.01
LATR (Madras et al., 2018) 0.84
GANSan  (S2) - MLP, 0.91 0.01
GANSan  (S2) - SVM, 0.84 0.04
Table 8. Comparison with other works on the basis of accuracy and demographic parity on Adult Census.

Scenario 3 : building a fair classifier.

The sanitizer helps to reduce discrimination based on the sensitive attribute, even when using the original data on a classifier trained on the sanitized one. As presented on the third row of figure 5, as we force the system to completely remove the unwarranted correlations, the discriminations observed when classifying the original unperturbed data are reduced. On the other hand, the accuracy exhibits here the highest negative slope with respect to all the scenarios investigated. This decrease of accuracy is explained by the difference of correlations between and and between and . As the fair classifiers are trained on the sanitized set ( and ), the decision boundary obtained is not relevant for and . Even with this reduction of accuracy (drop of for the best classifier in terms of accuracy), critical applications such as recidivism prediction, can leverage GANSan  by only using the sanitized set for the training of a classifier. However, further investigation should be done with respect to this phenomenon.

FairGan (Xu et al., 2018), which also investigated this scenario achieve and whereas our best classifier in accuracy (GB) achieves and for .

Scenario 4 : local sanitization.

Just as in the other scenarios, the more the correlations with the sensitive attribute are removed, the higher the drop of discrimination as quantified by the , and , the lower the accuracy on the original decision attribute. For instance, on the best classifier in terms of accuracy (GB), we obtain , at (the original values were and ). This prove that GANSan can be used locally, for instance by deploying it on a smartphone) allowing users to contribute to large datasets by sanitizing themselves and sharing their information for instance, with the guarantee that the sensitive attribute GANSan  has been trained for is removed. The drop of of accuracy is significant, but for application with time consuming training phase, using GANSan  to sanitize profiles without retraining the classifier seems to constitute a good compromise.

6. Conclusion

In this work, we have introduced GANSan , a novel preprocessing method inspired from GANs achieving fairness by removing the correlations between the protected attribute and the other attributes of the profile. Our experiments demonstrate that GANSan is able to prevent the inference of the protected attribute while limiting the loss of utility as measured in terms of the accuracy of a classifier learned on the sanitized data as well as by the damage on the numerical and categorical attributes. In addition, one of the strength of our approach is that it offers the possibility of local sanitization, by only modifying the attributes as little as possible while preserving the space of the original data (thus preserving interpretability). As a consequence, GANSan is agnostic to subsequent use of data in the sense that the sanitized data is not tied to a particular data analysis task.

While we have relied on three different type of external classifiers for capturing the difficulty to infer the protected attribute from the sanitized data, it is still possible that a more powerful classifier exists that could infer the protected attribute with a higher accuracy. Note that this is an inherent limitation of all the preprocessing techniques and not only our approach. Nonetheless, as future work we would like to investigate other families of learning algorithms in order to complete the range of external classifiers. Much work still need to be done to assess the relationship between the different notions of fairness, namely the impossibility of inference and the individual and group fairness. A possible improvement can be done by first finding the best sanitizer structure that could provide the highest possible fidelity, then set the alpha coefficient to greater than 0 to start removing correlations.



Appendix A Preprocessing of the dataset

Because our approach relies on neural networks, we need to apply standard preprocessing methods on the data to ensure that the training of GANSan will converge.

This preprocessing consists first in transforming categorical and numerical attributes with less than 5 values into binary ones, which is call the one-hot encoding in machine learning. For instance, the categorical attribute

becomes and with the corresponding binary value being respectively and . Each other attribute is also normalized between the possible minimum and maximum values. Afterwards a scaling between and . In addition on the Adult dataset, we need to apply first a logarithm on two columns: namely the and . This step is required by the fact that those attributes exhibit a distribution close to a Dirac delta (Dirac, 1981), with the maximal values being respectively and , and a median of for both (respectively and of records have a value of ). Since most values are equal to , the sanitizer will always nullify both attributes and the approach will not converge.

When applying GANSan a postprocessing step also needs to be performed on the output of the sanitizer (i.e., neural network) that mostly consists in undoing the preprocessing steps, plus remapping the generated data to the original space. This remapping ensures that the values generated by the sanitizer will fit in the original range of the attribute.

Appendix B Hyper-parameters tuning

Tables 9 details the parameters of the classifiers that have yielded the best results respectively on the Adult and German credit datasets. The training ratio represents the number of iterations on which each instance has been trained during the sanitization of a single batch. More precisely for a given iteration , the discriminator is trained with records while the sanitizer is trained with records. The number of iterations is determined by the ratio of the dataset size with respect to the batch size. In simple terms, iterations is the number of batches needed to complete only one epoch (). Our experiments were run for a total of 40 epochs, each epoch represent a complete presentation of the dataset to be learned (the entire dataset is passed forward and backward through the classifier only once). We varied the value using a geometric progression:

Sanitizer Discriminator
Layers 3x Linear 5 x Linear
Learning Rate (LR)
Hidden Activation ReLU ReLU
Output Activation LeakyReLU LeakyReLU
Losses VectorLoss MSE
Training ratio 1 50
Batch size
Optimizers Adam Adam
Table 9. Hyper parameters tuning for Adult and German datasets.

Appendix C GANSan  numerical attributes relative change

Numerical attributes differs from the categorical ones in the sense that the damage is not total, thus we cannot compute the proportion of records whose values are changed by the sanitization. For those numerical attributes, we compute the relative change (RC) normalized by the mean of the original and sanitized values:


We normalize the RC using the mean (since all values are positives) as it allows us to handle situations in which the original values are equal to . If both the sanitized and the original values are equal to , we simply set the to . This would have not been possible using only the deviation (percentage of change).

Appendix D Evaluation of group based discrimination

We present our results of group-based discrimination in table 10. We computed both the demographic parity and the equalized odds metrics as presented in the system model and fairness definitions section. In table 12, we present the protected attribute level (scenario S1) for all classifiers.

All these results are computed with

Dataset Classifier
Baseline S1 S2 S3 S4
Adult GB
Dataset Classifier
Baseline S1 S2 S3 S4
Adult GB
Dataset Classifier DemoParity
Baseline S1 S2 S3 S4
Adult GB
Table 10. Equalized odds and demographic parity.

Appendix E GANSan  utilities

Dataset Classifier yAcc
Baseline S1 S2 S3 S4
Adult Census GB
fid diversity
Dataset Baseline S1 Baseline S1
Adult Census
Table 11. Evaluation of GANSan ’s utility.
Dataset Classifier BER sAcc
Baseline Sanitized Baseline Sanitized
Adult GB
Table 12. Evaluation of GANSan ’s sensitive attribute protection.