In recent years, the availability and the diversity of large-scale datasets of personal information, the algorithmic advances in machine learning and the increase in computational power has lead to the development of personalized services and prediction systems to such an extent that their use is now ubiquitous in our society. To give a few examples, machine learning-based systems are now used in banking for assessing the risk associated to the loan application(Mahmoud et al., 2008), in job application to evaluate automatically the profile of a candidate (Faliagka et al., 2012) and in predictive justice to quantify the risk of recidivism of an inmate (Center, 2016).
Despite their usefulness, the predictions performed by these algorithms are not exempt from biases and numerous cases of discriminatory decisions have been reported over the last years. For instance, going back on the case of predictive justice, a study conducted by ProPublica showed that the recidivism prediction tool COMPAS, which is currently used in Broward County (Florida), is strongly biased against black defendants, by displaying a false positive rate twice as high for black persons than for white persons (Julia Angwin and Kirchner, 2016). If the dataset exhibits strong detectable biases towards a particular sensible group (e.g., an ethnic or minority group), the straightforward solution of removing the attribute that identified the sensitive group would only prevent direct discrimination. Indeed, indirect discrimination can still occur due to correlations between the sensitive attribute and other attributes.
In this article, we propose a novel approach called GANSan (for Generative Adversarial Network Sanitizer) to address the problem of discrimination due to the biased underlying data.
In a nutshell, our approach aims to learn a sanitizer (in our case a neural network) transforming the input data in a way that maximize the following two metrics : (1)fidelity, in the sense that the transformation should modify the data as little as possible, and (2) non-discrimination, which means that the sensitive attribute should be difficult to predict from the sanitized data.
A possible use case would be the recruitment process of referees for an amateur sport organization. In particular, in this situation process should be based primarily on the merit of applicant but at the same time the institution might be aware that the data used to train a model to automatize the recruitment process might be highly biased according race. In practice, approaches such as the Rooney Rule have been proposed and implemented to foster diversity for the recruitment of the coaches in the national football league as well as in other industries. To address this issue, the institution could use our approach to sanitize the data before applying a merit-based algorithm on the sanitized data to select the referee.
Another typical use case might be one in which a company, during its recruitment phase, offers candidates a tool to remove racial correlation in their personal profile before submitting their sanitized profile to the job application platform. If the tool is built correctly, the company recruitment system is free from racial discrimination as it never had access to the original profile, which is only known by the applicants.
Overall, our contributions can be summarized as follows:
We propose a novel approach in which a sanitizer is learned from the original data. The sanitizer can then be applied on a profile in such a way that the sensitive attribute is removed as well as existing correlations with other attributes while ensuring that the sanitized profile is modified as little as possible, preventing both direct and indirect discriminations. Our sanitizer is strongly inspired from Generative Adversarial Networks (GANs) (Goodfellow et al., 2014a), which have been highly successful in terms of applications.
Rather than building a fair classifier, our objective is more generic in the sense that we aim at debiasing the data with respect to the sensitive attribute. Thus, one of the main benefits of our approach is that the sanitization can be performed without having any knowledge regarding the task that are going to be conducted in the future on the sanitized data. In addition, as the sensitive attribute can refer to any characteristic of the user, we believeGANSan to be applicable to the broader context of data anonymization.
Another strength of our approach is that once it has been learned, it can be used directly by an individual to generate a modified version of his profile that still lives in the same representation space but from which it is very difficult to infer the sensitive attribute. In this sense, our method can be considered to fall under the category of randomized response techniques (Warner, 1965) as it can be used locally by a user to sanitize his data and thus does not require his true profile to be sent to a trusted third party. Of all of the approaches that currently exist in the literature to reach algorithmic fairness (Friedler et al., 2018), we are not aware of any other work that has considered the case of local sanitization with the exception of (Romanelli et al., 2019), which focuses on the protection of privacy but could also be applied to enhance fairness.
To demonstrate its usefulness, we have evaluated our approach on a real dataset by analyzing the achievable trade-off between fairness and utility measured both in terms of perturbations introduced by the sanitization framework but also with respect to the accuracy of a classifier learned on the sanitized data.
The outline of the paper is as follows. First, in Section 2, we introduce the system model before reviewing the background notions on fairness and GANs. Afterwards, in Section 3, we review the related work on methods for enhancing fairness belonging to the preprocessing approach like our approach before describing our method GANSan in Section 4. Finally, we evaluate experimentally our approach in Section 5 before concluding in Section 6.
In this section, we first present the system model used in this paper before reviewing the background notions on fairness metrics and generative adversarial networks.
2.1. System model
In this paper, we consider the generic setting of a dataset composed of records. Each record typically corresponds to the profile of the individual and is made of attributes, which can be categorical, discrete or continuous. Amongst those, two attributes are considered as being of special. First, the sensitive attribute S (e.g., gender, ethnic origin, religious belief, …) should remain hidden, for instance to prevent discrimination. Second, the decision attribute is typically used for a classification task (e.g., accept or reject an individual for a loan). The other attributes of the profile, which are neither S nor , will be referred hereafter as A.
For simplicity, in this work we restricted ourselves to the situations in which these two attributes are binary (i.e., and ). However, our approach could also be generalized easily to the multivalued attributes, although quantifying fairness in the case of non-binary attributes is much more challenging than for binary ones (Kearns et al., 2017). Our main objective is to prevent the possibility of inferring the sensitive attribute from the sanitized data.
2.2. Fairness metrics
First, we would like to point out that there are many different definitions of fairness existing in the literature (Arvind, 2018; Friedler et al., 2018; Verma and Rubin, 2018; Corbett-Davies et al., 2017; Dwork et al., 2012; Joseph et al., 2016) and that the choice of the appropriate definition is highly dependent of the context considered.
For instance, one natural approach for defining fairness is the concept of individual fairness (Dwork et al., 2012), which states that individuals that are similar except for the sensitive attribute should be treated similarly and thus should receive similar decisions. This notion relates to the legal concept of disparate treatment (Barocas and Selbst, 2016), which occurs if the decision process was made based on sensitive attributes. This definition is relevant when discrimination is due to a prejudice caused by the decision process and therefore cannot be used in the situation in which the objective is to directly redresses biases in the data.
In contrast to individual fairness, group fairness relies on statistic of outcomes of the subgroups indexed by S and can be quantified in several ways, such as demographic parity (Berk
et al., [n. d.]) and equalized odds
equalized odds(Hardt et al., 2016). More precisely, the demographic parity corresponds to the absolute difference of rates of positive outcomes in the sensitive and default groups (for which respectively and ):
while equalized odds is the absolute difference of odds in each subgroup :
Equalized odds (Hardt et al., 2016) requires the equality of true and false positives: and is more suitable to use than demographic parity in the situation in which of the base rates in both groups differ (). Note that these definitions are agnostic to the cause of the discrimination and are based solely on the assumption that statistics of outcomes should be similar for each subgroup.
In our work, we follow a different line of research by defining fairness in terms of the inability to infer S from other attributes (Feldman et al., 2015; Xu et al., 2018). This approach stems from the fact that it is impossible to discriminate based on the sensitive attribute if the latter cannot be predicted. Thus, our approach aims at altering the data in such a way that no classifier should be able to infer the sensitive attribute from the sanitized data. The inability to infer the attribute S is measured mainly using the accuracy of a predictor Adv trained to recover the hidden S (sAcc), as well as the balanced error rate (BER) introduced in (Feldman et al., 2015):
The BER captures the predictability of both classes and a value of can be considered optimal for protecting against inference in the sense that it means that the prediction of the predictor is not better than a random guess. In addition, the BER is more relevant than using directly the accuracy of a classifier at predicting the sensitive attribute for datasets in which the sensitive and default groups are unbalanced. To summarize, the objective of a successfull sanitization is to cause the sensitive accuracy to drop significantly while raising the BER close to its optimal value of .
2.3. Generative adversarial network
Generative Adversarial Network (GAN) is a relatively novel approach from machine learning introduced to solve the difficult problem of modelling and learning high dimensional distributions (e.g., pictures). Typically, in a GAN two neural networks compete against each other in a zero-sum game framework (Goodfellow et al., 2014b). The first neural network is called the generator and its aim is to learn to produce from noise data close enough to a given distribution , its production being the final objective of the GAN. The second network is called the discriminator and its presence is motivated by the difficulty of assessing objectively the quality of the generator. Its task is to discriminate whether the given sample originates from the generator or from the training data.
Despite its intuitive aspect and the fact that GANs are powerful tools for modelling distributions, the training of a GAN can be difficult and often require important engineering efforts for ensuring its success (Zhu et al., 2017). For instance during the training phase, if the discriminator or the generator becomes quickly more accurate than its counterpart, the other one will not be able to catch-up and improve its performance.
Mirza and Osindero (Mirza and Osindero, 2014) propose an extension of GAN, called CycleGAN, in which the goal is to learn to translate between two similar distributions. Our approach, GANSan , is inspired by the framework of GANs and CycleGANs in the sense that our objective is to learn to remove the dependency between the protected attribute and the other attributes without having an explicit description of these dependencies and by solely relying on the ability of an adversary to distinguish (or not) between the sensitive and default groups.
3. Related work
In recent years, many approaches have been developed to enhance the fairness of machine learning algorithms. Most of these techniques can be classified into three families of approaches: namely (1) the preprocessing approach (Edwards and Storkey, 2015; Feldman et al., 2015; Louizos et al., 2015; Zemel et al., 2013) in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), (2) the algorithmic modification approach (also sometimes called constrained optimization) in which the learning algorithm is adapted to ensure that it is fair by design (Zafar et al., 2017; Kamishima et al., 2012) and (3) the postprocessing approach that modifies the output of the learning algorithm to increase the level of fairness (Kamiran et al., 2010; Hardt et al., 2016). We refer the interested reader to (Friedler et al., 2018) for a recent survey comparing the different fairness enhancing methods. Due to the limited space and as our approach falls within the preprocessing approach, we will review afterwards only the main methods of this category.
Among the seminal work in fairness enhancement in (Feldman et al., 2015), the authors have developed a framework that consists in translating conditional distributions of each of the datasets’ attributes by shifting them towards a median distribution. While this approach is straightforward, it does not take into account unordered categorical attributes as well as correlations that might arise due to a combination of attributes, which we address in this work. Zemel and co-authors (Zemel et al., 2013) have proposed to learn a fair representation of data based on a set of prototypes, which preserves the outcome prediction accuracy and allows an accurate reconstruction of the original profiles. Each prototype can equally identify groups based on the sensitive attribute values. This technique has been one of the pioneer work in mitigating fairness by changing the representation space of the data. However, for this approach to work the definition of the set of prototypes is highly critical. In the same direction, the authors in (Calmon et al., 2017) have learned an optimal randomized mapping for removing group-based discrimination while limiting the distortion introduced at the profile and the distribution levels to preserve utility. Similarly, Louizos and co-authors (Louizos et al., 2015) used a variational auto-encoder (Kingma and Welling, 2013) to enhance fairness by choosing a prior distribution independently of the group membership and removing differences across groups with the maximum mean discrepancy (Gretton et al., 2007).
In addition, several approaches have been explored to enhance fairness based on adversarial learning. For instance, Edwards and Storkey (Edwards and Storkey, 2015) have trained an encoder to output a representation from which an adversary is unable to accurately predict the group membership, but from which a decoder is able to reconstruct the data and on which a decision predictor still performs well. Madras, Creager, Pitassi and Zemel (Madras et al., 2018) extended this framework to satisfy the equality of opportunities (Hardt et al., 2016) constraint and explored the theoretical guarantees for fairness provided by the learned representation as well as the ability of the representation to be used for a different classification tasks. Beutel, Chen, Zhao and Chi (Beutel et al., 2017) have studied how the choice of data affects the fairness in the context of adversarial learning. One of the interesting result of their study is the relationship between statistical parity and the removal of the sensitive attribute, which demonstrates that learning a representation independent of the sensitive attribute with a balanced dataset ensures statistical parity. Zhang, Lemoine and Mitchell (Zhang et al., 2018) have designed a decision predictor satisfying group fairness by ensuring that an adversary is unable to infer the sensitive attribute from the predicted outcome. Afterwards, Wadsworth, Vera and Piech (Wadsworth et al., 2018) have applied the latter framework in the context of recidivism prediction, demonstrating that it is possible to significantly reduce the discrimination while maintaining nearly the same accuracy as on the original data. Finally, Sattigeri and co-authors (Sattigeri et al., 2018)
have developed a method to cancel out bias in high dimensional data, such as multimedia data, using adversarial learning.
While these approaches are effective at addressing fairness, one of the common drawback of these methods is that they do not preserve the interpretability of the data. One notable exception is the method proposed by Xu, Yuan, Zhang and Wu (Xu et al., 2018) called FairGan, which is the closest to ours, even though their objective is to learn a fair classifier on a dataset that has been generated such that it is discrimination-free and whose distribution on attributes is close to the original one. Our approach further diverges from this work by the fact that their approach is a direct application of the original GAN framework coupled with a second adversary (whose task is to reconstruct the sensitive attribute from samples that successfully fooled the first discriminator), while ours can be rightfully compared to an auto-encoder coupled with the same adversary. Inspired from (Tripathy et al., 2017), Romanelli, Palamidessi and Chatzikokolakis (Romanelli et al., 2019) have developed a method for learning an optimal privacy protection mechanism also inspired from GAN, which they have applied to location privacy. Here, the objective is to minimize the amount of information (measured by the mutual information) preserved between the sensitive attribute and the prediction of this attribute made by a classifier, while respecting a bound on the utility of the dataset. This approach differs from ours in several points. In particular, their focus is on protecting location privacy while ours is on enhancing fairness. In addition, they put a bound on the utility while we impose a bound on fairness.
Drawing from existing works in the privacy field, Ruggieri (Ruggieri, 2014) showed that the -closeness anonymization technique (Ninghui et al., 2007) can be used as a preprocessing approach to control discrimination as there is a close relationship between -closeness and group fairness. In addition, local sanitization approaches (also called randomized response techniques) has been investigated for the protection of privacy. More precisely, one of the benefit of local sanitization is that there is no need to centralize the data before sanitizing it, thus limiting the trust assumptions that an individual has to make on external entities when sharing his data. For instance, Wang, Hu and Wu (Wang et al., 2016) have applied randomized response techniques achieving differential privacy during the data collection phase to avoid the need to have an untrusted party collecting sensitive information. Similarly to our approach, the protection of information takes place at the individual level as the user can randomize his data before publishing it. The main objective is to produce a sanitized dataset in which global statistical properties are preserved but from it is not possible to infer the sensitive information of a specific user. In the same direction, Du and Zhan (Du and Zhan, 2003)
have proposed a method for learning a decision tree classifier on this sanitized data. However, none of this previous works have taken into account the fairness aspect. Thus, while our method falls also within the local sanitization approaches in the sense that the sanitizer can be apply locally by a user, our initial objective is quite different as we aim at preventing the risk of discrimination. Nonetheless, at the same time our method also protects against attribute inference with respect to the protected attribute.
4. Local sanitization for data debiasing
As explained previously, simply removing the sensitive attribute from the data is rarely sufficient to guarantee fairness as correlations are likely to exist between other attributes and the sensitive one. Those correlations could be straightforward like attributes including direct information on the sensitive one but can very well be more complex such as a non-linear combination of several attributes. In general, detecting complex correlations between attributes as well as suppressing them is a difficult task.
To address this issue, our approach GANSan relies on the modelling power of GANs to build a sanitizer that can cancel out correlations with S without having an explicit model of those correlations. In particular, it exploits the capacity of the discriminator to distinguish the subgroups indexed by the sensitive attribute. Once the sanitizer has been trained, any individual can locally apply it on his profile before disclosing it to ensure that the sensitive information is hidden. The sanitized data can then be safely used for any subsequent task.
4.1. Generative adversarial network sanitization
High level overview.
Formally, given a dataset , the objective of GANSan is to learn a function , called the sanitizer that perturbs individual profiles of the dataset , such that a distance measure called the fidelity (in our case we will use the norm) between the original and the sanitized datasets (), is minimal, while ensuring that S cannot be recovered from . Our approach differs from classical conditional GAN (Mirza and Osindero, 2014) by the fact that the objective of our discriminator is to reconstruct the hidden sensitive attribute from the generator output, whereas the discriminator in classical conditional GAN has to discriminate between the generator output and samples from the true distribution.
The high-level overview of the training of GANSan is as follows:
The first step corresponds to the training of the sanitizer (Algorithm1: lines ). Basically, the sanitizer can be seen as the generator as in standard GAN but with a different purpose. In a nutshell, it learns the empirical distribution of the sensitive attribute and generate a new distribution that concurrently respect two objectives: (1) finding a perturbation that will fool the discriminator in predicting S while (2) minimizing the damage introduced by the sanitization. More precisely, the sanitizer takes as input the original dataset (including S and Y) plus some noise . The noise is used to avoid the over-specialization of the sanitizer on the training set while making the reverse mapping of sanitized profiles to their original versions more difficult.
The second step consists in training the discriminator for predicting the sensitive attribute from the data produced by the sanitizer (Algorithm1:lines ). The rationale of our approach is that the better the discriminator is at predicting the sensitive attribute S, the worse the sanitizer is at hiding it and thus the higher the potential risk of discrimination.
Training GANSan .
Let be the prediction of S by the discriminator (). Its goal is to accurately predict S, thus it aims at minimizing the loss . In practice in our work, we instantiate as the Mean Squared Error (MSE).
Given an hyperparameterrepresenting the desired trade-off between the fairness and the fidelity, the sanitizer minimizes a loss combining two objectives:
in which is on the sensitive attribute. The term is due to the objective of maximizing the error of the discriminator (i.e., the optimal value of the BER is ).
With respect to the reconstruction loss
, we have first tried the classical Mean Absolute Error (MAE) and MSE losses. However, our initial experiments have shown that these losses produce datasets that are highly problematic in the sense that the sanitizer always outputs the same profile whatever the input profile, thus making it unusable. Therefore, we had to design a slightly more complex loss function. More precisely, we chose not to merge the respective losses of these attributes (), yielding a vector of attribute losses whose components are iteratively used in the gradient descent. Hence, each node of the output layer of the generator is optimized to reconstruct a single attribute from the representation obtained from the intermediate layers. The vector formulation of the loss is as follows: and the objective is to minimize all its components. We are planning to conduct a deeper analysis of the vector formulation as well as its interactions with differents optimization techniques used in a future work. The details of the parameters used for the training are detailed in Appendices A and B.
4.2. Performance metrics
The performance of GANSan will be evaluated by taking into account the fairness enhancement and the fidelity to the original data. With respect to fairness, we will quantify it primarily with the inability of a predictor , hereafter referred to as the adversary, in inferring the sensitive attribute (cf. Section 2) using primarily its Balanced Error Rate (BER) (Feldman et al., 2015) and its accuracy (sAcc) (cf., Section 2.2). We will also assess the fairness using metrics (cf. Section 2) such as the demographic parity (Equation 1) and the equalized odds (Equation 2).
To measure the fidelity between the original and the sanitized data, we have to rely on a notion of distance. More precisely, our approach does not require any specific assumption on the distance used, although it is conceivable it would work better with some than others. For the rest of this work we will instantiate by the -norm as it does not differentiate between attributes.
A high fidelity is not a sufficient condition to imply a good reconstruction of the dataset, as early experiments showed that the sanitizer might find a “median” profile to which it will map all input profiles. Thus, in addition to quantify the ability of the sanitizer to preserve the diversity of the dataset, we introduce the diversity measure, which is defined in the following way :
While quantifies how different the original and the sanitized datasets are, the diversity measures how diverse the profiles are in each datasets. We will also quantitatively discuss the amount of damage for a given fidelity and fairness to give a better understanding of the qualitative meaning of the fidelity.
Finally, we also evaluate the loss of utility induced by the sanitization by relying on the the accuracy of prediction on a classification task. More precisely, the difference in between a classifier trained on the original data and one trained on the sanitized data can be used as a measure of the loss of utility introduced by the sanitization with respect to the classification task.
5. Experimental evaluation
In this section, we describe the experimental setting used to evaluate GANSan as well as the results obtained.
5.1. Experimental setting
We have evaluated our approach on Adult Census Income, available at the UCI repository111https://archive.ics.uci.edu/ml/index.php. Adult Census reports the financial situation of individuals, with 45222 records after the removal of rows with empty values. Each record is characterized by 15 attributes among which we chose, the gender (i.e., male or female) has the sensitive one and the income level (i.e., over or below 50K$) has the decision.
|Group||Sensitive (, Female)||Default (, Male)|
We will evaluate GANSan using metrics such as the fidelity , the BER and the demographic parity . For this, we have conducted a 10-fold cross validation. More precisely, the dataset is divided into 10 blocks such that during each fold 8 blocks are used for the training, while one of the block is retained as the validation set and the last one has the test set. In the following, we report on the results obtained for 7 folds. We computed the and using the discriminator of GANSan and three external classifiers (Support Vector Machines (SVM) (Cortes and Vapnik, 1995), Multilayer Perceptron (MLP) (Popescu et al., 2009) and Gradient Boosting (GB) (Friedman, 2002)), independent of the GANSan
framework. For all these external classifiers and all epochs, we report the space of achievable points.
For each fold and for each value of , we train the sanitizer for epochs. At the end of each epoch, we save the sanitizer state and generate a sanitized dataset on which we compute the , and . With this, selects the sanitized dataset that is closest to the optimal point ().
More precisely, is defined as follows: with and referring to the minimum value of obtained with the external classifiers. selects among the sanitizers saved at the end of each epoch, the ones achieving the highest fairness in terms of for the lowest damage, for each value of the hyper-parameter .
We will use the same families of external classifiers for computing the metrics , and . We used the same chosen test set to conduct the detailed analysis of its reconstruction’s quality ( and quantitative damage on attributes).
5.2. Evaluation scenarios
Recall, that GANSan takes as input the whole sanitized dataset (including the sensitive and the decision attributes) and outputs a sanitized dataset (without the sensitive attribute) in the same space as the original one, but from which it is impossible to infer the sensitive attribute. In this context, the overall performance of GANSan can be evaluated by analyzing the reachable space of points characterising the trade-off between the fidelity to the original dataset and the fairness enhancement. More precisely, during our experimental evaluation we will measure the fidelity between the original and the sanitized data, as well as the , both in relation with the and , computed on this dataset.
However, in practice the sanitized dataset can be used in several situations. In the following, we detail four scenarios that we believe as representing most of the possible use cases of GANSan . To ease the understanding, we will use the following notation: the subscript (respectively ) will denote the data in the training set (respectively test set). For instance, in which can either be , , or , represents respectively the attributes of the original training set (not including the sensitive and the decision attributes), the decision in the original training set, the attributes the sanitized training set and the decision attribute in the sanitized training set. Table 2 summarizes the notation used while in Table 3 we describes the composition of the training and the testings sets for these four scenarios.
|original decision||sanitized decision.|
|original decision in the training set||sanitized decision in the training set.|
|original decision in the test set||sanitized decision in the test set.|
|original attributes (not including the sensitive and the decision attributes).||sanitized attributes (not including the sensitive and the decision attributes).|
|original attributes in the training set.||sanitized attributes in the training set.|
|original attributes in the test set.||sanitized attributes in the test set.|
In details, the scenarios that we considered for our evaluation are the following.
Scenario 1 : complete data debiasing.
This procedure correspond to a typical use of the sanitized dataset, which is the prediction of a decision attribute through a classifier. The decision attribute is also sanitized as we assumed that the original decision holds information about sensitive attribute. Here, we quantify the accuracy of prediction of as well as the discrimination represented by the demographic parity gap (Equation 1) and the equalized odds gap (Equation 2) defined in Section2.
Scenario 2 : partial data debiasing.
In this scenario, just like the previous one, the training and the test sets are sanitized with the exception that the sanitized decision in both these datasets is replaced with the original one . This scenario is generally the one considered in the majority of paper on fairness enhancement (Zemel et al., 2013; Edwards and Storkey, 2015; Madras et al., 2018), the accuracy loss in the prediction of the original decision between this classifier and another trained on the original dataset without modifications being a straightforward way to quantify the utility loss due to the sanitization.
Scenario 3 : building a fair classifier.
This scenario was considered in (Xu et al., 2018) and is motivated by the fact that the sanitized dataset might introduced some undesired perturbation (e.g. changing the education level from Bachelor to PhD). Thus, a third party might build a fair classifier but still apply it directly on the unperturbed data to avoid the data sanitization process and the risks associated. More precisely in this scenario, a fair classifier is obtained by training it on the sanitized dataset to predict the sanitized decision . Afterwards, this classifier is tested on the original data () by measuring its fairness is measured through the demographic parity (Equation 1, Section 2). We also compute the accuracy of the fair classifier with respect to the original decision of the test set .
Scenario 4 : local sanitization.
The local sanitization scenario corresponds to a private use of the sanitizer. For instance, the sanitizer could be used as part of a mobile phone application providing individuals with a tool to remove some sensitive attributes from their profile before disclosing it to an external entity. In this scenario, we assume the existence of a biased classifier, trained to predict the original decision on the original dataset . The user has no control on this classifier but he is allowed nonetheless to perform the sanitization locally on his profile before submitting it to the existing classifier. This classifier is applied on the sanitized test set and its accuracy is measured with respect to the original decision as well as the fairness enhancement quantified by the DemoParity.
|Scenario||Train set composition||Test set composition|
All scenarios require the use of a sanitized version of the dataset (either or or both) for either training the model or computing results (the decision accuracy , and ). We use to select the version of the dataset to use. In fact for each value of , we generate a new version of the sanitized dataset at the end of each epoch.
Figure 3 describes the achievable trade-off between fairness and fidelity obtained using the sanitizer. Note that even when (i.e., all the weight is put on utility), we cannot reach a perfect fidelity to the original data as we get at most (cf. Figure 3). However, it can be seen that the fairness improves with the increase of the value of the as expected. A low value of such as 0.2 provides a fidelity close to the highest possible (), but leads to a BER that is not higher than . Note that this still improves the fairness compares to the original data (, ).
In the other direction, setting the value of the coefficient to a high value such as allows the sanitizer to completely remove the unwarranted correlations () with a cost on fidelity (). At the extreme setting in which , the data is sanitized without putting any emphasis on the fidelity. In this case, the is optimal as expected and the fidelity of , lower than the .
Concerning the , we observe the same behaviour as the . More precisely, the accuracy drops significantly when the value of increases. Here, the optimal value is the proportion of the majority class, which GANSan renders the accuracy of predicting S from the sanitized set closer to. However, even with , it is impossible to reach this optimal value. Setting will lead to a significantly sanitization of the dataset while preserving a fidelity closer to the maximum achievable possible.
The quantitative analysis with respect to the impact on diversity is shown in Figure 4. More precisely, the smallest drop of diversity obtained is , which is achieved when we set . Among all valus of , the biggest drop observed is . The application of GANSan therefore introduces an irreversible perturbation as observed with the fidelity. This loss of diversity implies that the sanitization reinforces the similarity between sanitized profiles as increases, rendering them almost identical or forcing the input profiles to be mapped to a small number of stereotypes.
When is in the range , of categorical attributes have a proportion of modified records between and (cf. Figure 3(a)). For most of the numerical attributes at least of records in the dataset have a relative change lower than , if , . For that same amount of relative change (), we observe that at least of records in the sanitized dataset are covered for lower values of . Selecting leads to of records being modified with a relative change of less than . In particular, the most damaged profiles are presented in Table 4. From this table as well as Table 5, we can observe that the sanitization process transforms all profiles but with different degrees. In Table 6, the same profile (which was the most damaged one in fold 1) was tracked across different folds and we show how the sanitization process affects it. We can see that the modifications applied to the profile accross different folds are not deterministic.
|Attrs||Original||Fold 1||Fold 4||Fold 3|
Scenario 1 : complete data debiasing.
In this scenario, we observe that GANSan preserves the accuracy of the dataset. More precisely, it increases the accuracy of the decision prediction on the sanitized dataset for all classifiers (cf. Figure 5, Scenario S1), compared to the original one which is , and respectively for GB, MLP and SVM. This increase can be explained by the fact that GANSan modifies profiles to make them more coherent with the associated decision, by removing correlations between the sensitive attribute and the decision one. As a consequence, this sets the same decision to similar profiles in both the protected and the default groups. As a matter of fact, nearly the same distributions of decision attribute are observed before and after the sanitization, but some records with either positive or negative decisions are shuffled (around of records remain unchanged, at ). We also believe that the increase of accuracy is correlated with drop of diversity. More precisely, if profiles are closer to each other, the decision boundary might be easier to find. We present in Table 7 the shift of decision proportion across the different folds, at . We observe that in some cases, the sanitizer transforms the binary decision column almost into a single-valued one. We leave the study of how GANSan affects the decision boundary as future work.
The discrimination is reduced as observed through , and , which all exhibit a negative slope as we expected. When the correlations with the sensitive attribute are significantly removed (), those metrics also significantly decrease. For instance, at , , , , for GB; whereas as the original demographic parity gap and equalised odds gap are respectively , . See Tables 10 and 12 for more detailed results.
In this setup, FairGan (Xu et al., 2018) achieve a BER of an accuracy of and a demographic parity of .
Scenario 2 : partial data debiasing.
Unexpectedly, we observe an increase in accuracy for most values of alpha. The demographic parity gap also decreases while the equalized odds remain nearly constant (, green line on Figure 5). Table 8 compare the results obtained to other existing work from the state-of-the-art. We include the classifier with the highest accuracy (MLP) and the one with the lowest one (SVM). From these results, we can observe that our method outperforms the others in terms of accuracy, but the demographic parity is best achieved with the work done in (Zhang et al., 2018) (), which is not surprising as this method is precisely tailored to reduce this metric. Even though our method is not specifically constrained to mitigate the demographic parity, we can observe that it significantly enhance it.
|LFR (Zemel et al., 2013)|
|ALFR (Edwards and Storkey, 2015)|
|MUBAL (Zhang et al., 2018)||0.01|
|LATR (Madras et al., 2018)||0.84|
|GANSan (S2) - MLP,||0.91 0.01|
|GANSan (S2) - SVM,||0.84 0.04|
Scenario 3 : building a fair classifier.
The sanitizer helps to reduce discrimination based on the sensitive attribute, even when using the original data on a classifier trained on the sanitized one. As presented on the third row of figure 5, as we force the system to completely remove the unwarranted correlations, the discriminations observed when classifying the original unperturbed data are reduced. On the other hand, the accuracy exhibits here the highest negative slope with respect to all the scenarios investigated. This decrease of accuracy is explained by the difference of correlations between and and between and . As the fair classifiers are trained on the sanitized set ( and ), the decision boundary obtained is not relevant for and . Even with this reduction of accuracy (drop of for the best classifier in terms of accuracy), critical applications such as recidivism prediction, can leverage GANSan by only using the sanitized set for the training of a classifier. However, further investigation should be done with respect to this phenomenon.
FairGan (Xu et al., 2018), which also investigated this scenario achieve and whereas our best classifier in accuracy (GB) achieves and for .
Scenario 4 : local sanitization.
Just as in the other scenarios, the more the correlations with the sensitive attribute are removed, the higher the drop of discrimination as quantified by the , and , the lower the accuracy on the original decision attribute. For instance, on the best classifier in terms of accuracy (GB), we obtain , at (the original values were and ). This prove that GANSan can be used locally, for instance by deploying it on a smartphone) allowing users to contribute to large datasets by sanitizing themselves and sharing their information for instance, with the guarantee that the sensitive attribute GANSan has been trained for is removed. The drop of of accuracy is significant, but for application with time consuming training phase, using GANSan to sanitize profiles without retraining the classifier seems to constitute a good compromise.
In this work, we have introduced GANSan , a novel preprocessing method inspired from GANs achieving fairness by removing the correlations between the protected attribute and the other attributes of the profile. Our experiments demonstrate that GANSan is able to prevent the inference of the protected attribute while limiting the loss of utility as measured in terms of the accuracy of a classifier learned on the sanitized data as well as by the damage on the numerical and categorical attributes. In addition, one of the strength of our approach is that it offers the possibility of local sanitization, by only modifying the attributes as little as possible while preserving the space of the original data (thus preserving interpretability). As a consequence, GANSan is agnostic to subsequent use of data in the sense that the sanitized data is not tied to a particular data analysis task.
While we have relied on three different type of external classifiers for capturing the difficulty to infer the protected attribute from the sanitized data, it is still possible that a more powerful classifier exists that could infer the protected attribute with a higher accuracy. Note that this is an inherent limitation of all the preprocessing techniques and not only our approach. Nonetheless, as future work we would like to investigate other families of learning algorithms in order to complete the range of external classifiers. Much work still need to be done to assess the relationship between the different notions of fairness, namely the impossibility of inference and the individual and group fairness. A possible improvement can be done by first finding the best sanitizer structure that could provide the highest possible fidelity, then set the alpha coefficient to greater than 0 to start removing correlations.
- Arvind (2018) Narayanan Arvind. 2018. 21 Fairness Definitions and Their Politics. Tutorial presented at the Conference on Fairness, Accountability, and Transparency (2018).
- Barocas and Selbst (2016) Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Cal. L. Rev. 104 (2016), 671.
- Berk et al. ([n. d.]) Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. [n. d.]. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research ([n. d.]), 0049124118782533.
- Beutel et al. (2017) Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. 2017. Data decisions and theoretical implications when adversarially learning fair representations. Fairness, Accountability, and Transparency in Machine Learning (2017).
- Calmon et al. (2017) Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems. 3992–4001.
- Center (2016) Electronic Privacy Information Center. 2016. EPIC - Algorithms in the Criminal Justice System. https://epic.org/algorithmic-transparency/crim-justice/
- Corbett-Davies et al. (2017) Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797–806.
- Cortes and Vapnik (1995) Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.
- Dirac (1981) Paul Adrien Maurice Dirac. 1981. The principles of quantum mechanics. Number 27. Oxford university press.
- Du and Zhan (2003) Wenliang Du and Zhijun Zhan. 2003. Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 505–510.
- Dwork et al. (2012) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214–226.
- Edwards and Storkey (2015) Harrison Edwards and Amos Storkey. 2015. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897 (2015).
- Faliagka et al. (2012) Evanthia Faliagka, Athanasios Tsakalidis, and Giannis Tzimas. 2012. An integrated e-recruitment system for automated personality mining and applicant ranking. Internet research 22, 5 (2012), 551–568.
- Feldman et al. (2015) Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259–268.
- Friedler et al. (2018) S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth. 2018. A comparative study of fairness-enhancing interventions in machine learning. ArXiv e-prints (Feb. 2018). arXiv:stat.ML/1802.04422
- Friedman (2002) Jerome H Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367–378.
- Goodfellow et al. (2014a) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014a. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
- Goodfellow et al. (2014b) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014b. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
- Gretton et al. (2007) Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. 2007. A kernel method for the two-sample-problem. In Advances in neural information processing systems. 513–520.
et al. (2016)
Moritz Hardt, Eric Price,
Nati Srebro, et al.
Equality of opportunity in supervised learning. InAdvances in neural information processing systems. 3315–3323.
- Joseph et al. (2016) Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems. 325–333.
- Julia Angwin and Kirchner (2016) Surya Mattu Julia Angwin, Jeff Larson and Lauren Kirchner. 2016. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- Kamiran et al. (2010) Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination Aware Decision Tree Learning. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, Washington, DC, USA, 869–874. https://doi.org/10.1109/ICDM.2010.50
- Kamishima et al. (2012) Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 35–50.
- Kearns et al. (2017) Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2017. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144 (2017).
- Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2013).
- Louizos et al. (2015) Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2015. The variational fair autoencoder. arXiv preprint arXiv:1511.00830 (2015).
- Madras et al. (2018) David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309 (2018).
- Mahmoud et al. (2008) Mostafa Mahmoud, N Algadi, and Ahmed Ali. 2008. Expert System for Banking Credit Decision. , 813 - 819 pages.
- Mirza and Osindero (2014) Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
- Ninghui et al. (2007) Li Ninghui, Li Tiancheng, and Suresh Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. 106–115. https://doi.org/10.1109/ICDE.2007.367856
- Popescu et al. (2009) Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. 2009. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems 8, 7 (2009), 579–588.
- Romanelli et al. (2019) Marco Romanelli, Catuscia Palamidessi, and Konstantinos Chatzikokolakis. 2019. Generating Optimal Privacy-Protection Mechanisms via Machine Learning. arXiv preprint arXiv:1904.01059 (2019).
- Ruggieri (2014) Salvatore Ruggieri. 2014. Using t-closeness anonymity to control for non-discrimination. Trans. Data Privacy 7, 2 (2014), 99–129.
- Sattigeri et al. (2018) Prasanna Sattigeri, Samuel C Hoffman, Vijil Chenthamarakshan, and Kush R Varshney. 2018. Fairness GAN. arXiv preprint arXiv:1805.09910 (2018).
- Tripathy et al. (2017) Ardhendu Tripathy, Ye Wang, and Prakash Ishwar. 2017. Privacy-preserving adversarial networks. arXiv preprint arXiv:1712.07008 (2017).
- Verma and Rubin (2018) Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. (2018).
- Wadsworth et al. (2018) Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction. arXiv preprint arXiv:1807.00199 (2018).
- Wang et al. (2016) Yue Wang, Xintao Wu, and Donghui Hu. 2016. Using Randomized Response for Differential Privacy Preserving Data Collection.. In EDBT/ICDT Workshops, Vol. 1558.
- Warner (1965) Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63–69.
- Xu et al. (2018) Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2018. FairGAN: Fairness-aware Generative Adversarial Networks. arXiv preprint arXiv:1805.11202 (2018).
et al. (2017)
Muhammad Bilal Zafar,
Isabel Valera, Manuel Gomez Rogriguez,
and Krishna P. Gummadi. 2017.
Fairness Constraints: Mechanisms for Fair
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research), Aarti Singh and Jerry Zhu (Eds.), Vol. 54. PMLR, Fort Lauderdale, FL, USA, 962–970. http://proceedings.mlr.press/v54/zafar17a.html
- Zemel et al. (2013) Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333.
- Zhang et al. (2018) Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. (2018).
et al. (2017)
Jun-Yan Zhu, Taesung
Park, Phillip Isola, and Alexei A
Unpaired image-to-image translation using cycle-consistent adversarial networks.arXiv preprint (2017).
Appendix A Preprocessing of the dataset
Because our approach relies on neural networks, we need to apply standard preprocessing methods on the data to ensure that the training of GANSan will converge.
This preprocessing consists first in transforming categorical and numerical attributes with less than 5 values into binary ones, which is call the one-hot encoding in machine learning. For instance, the categorical attributebecomes and with the corresponding binary value being respectively and . Each other attribute is also normalized between the possible minimum and maximum values. Afterwards a scaling between and . In addition on the Adult dataset, we need to apply first a logarithm on two columns: namely the and . This step is required by the fact that those attributes exhibit a distribution close to a Dirac delta (Dirac, 1981), with the maximal values being respectively and , and a median of for both (respectively and of records have a value of ). Since most values are equal to , the sanitizer will always nullify both attributes and the approach will not converge.
When applying GANSan a postprocessing step also needs to be performed on the output of the sanitizer (i.e., neural network) that mostly consists in undoing the preprocessing steps, plus remapping the generated data to the original space. This remapping ensures that the values generated by the sanitizer will fit in the original range of the attribute.
Appendix B Hyper-parameters tuning
Tables 9 details the parameters of the classifiers that have yielded the best results respectively on the Adult and German credit datasets. The training ratio represents the number of iterations on which each instance has been trained during the sanitization of a single batch. More precisely for a given iteration , the discriminator is trained with records while the sanitizer is trained with records. The number of iterations is determined by the ratio of the dataset size with respect to the batch size. In simple terms, iterations is the number of batches needed to complete only one epoch (). Our experiments were run for a total of 40 epochs, each epoch represent a complete presentation of the dataset to be learned (the entire dataset is passed forward and backward through the classifier only once). We varied the value using a geometric progression:
|Layers||3x Linear||5 x Linear|
|Learning Rate (LR)|
Appendix C GANSan numerical attributes relative change
Numerical attributes differs from the categorical ones in the sense that the damage is not total, thus we cannot compute the proportion of records whose values are changed by the sanitization. For those numerical attributes, we compute the relative change (RC) normalized by the mean of the original and sanitized values:
We normalize the RC using the mean (since all values are positives) as it allows us to handle situations in which the original values are equal to . If both the sanitized and the original values are equal to , we simply set the to . This would have not been possible using only the deviation (percentage of change).
Appendix D Evaluation of group based discrimination
We present our results of group-based discrimination in table 10. We computed both the demographic parity and the equalized odds metrics as presented in the system model and fairness definitions section. In table 12, we present the protected attribute level (scenario S1) for all classifiers.
All these results are computed with
Appendix E GANSan utilities