## Introduction

Over the past few years, machine learning algorithms have emerged in many different fields of application. However, there is a growing concern about their potential to reproduce discrimination against a particular group of people based on sensitive characteristics such as religion, race, gender, or other. In particular, algorithms trained on biased data are prone to learn, perpetuate or even reinforce these biases NIPS2016_6228. Numerous incidents of this nature have been reported angwin2016machine; Lambrecht in recent years. For this reason, there has been a dramatic rise of interest for fair machine learning by the academic community and many bias mitigation strategies have been proposed zhang2018mitigating; adel2019one; Hardt2016; grari2019fairness; chen2019fairness; zafar2015fairness; celis2019classification; wadsworth2018achieving during the last decade. Currently, most state-of-the-art in fair machine learning algorithms require the knowledge of sensitive information during training. However, in practice, it is unrealistic to assume that sensitive information is available and even collected. In Europe, for example, a car insurance company cannot ask a potential client about his or her origin or religion, as this is strictly regulated. The EU introduced the General Data Protection Regulation (GDPR) in May 2018. This legislation has represented one of the most important changes in the regulation of data privacy from more than 20 years. It strictly regulates the collection and use of sensitive personal data. With the aim of obtaining non-discriminatory algorithms, Article 9(1) rules that in general: ”Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.” citeulike:14071352. Ignoring sensitive attributes as input of predictive machines is known as ”fairness through unawareness” Pedreshi2008. However, this is clearly not sufficient, since complex correlations in the data may provide unexpected links to sensitive information. In fact, presumably non-sensitive attributes can serve as substitutes or proxies for protected attributes. For instance, the color and the model of a car combined with the driver’s occupation can lead to unwanted gender bias in the prediction of car insurance price. To cope with it, a large trend of machine learning methods has recently emerged to mitigate biases in output predictions, based on some fairness criteria w.r.t. the sensitive attributes zhang2018mitigating; adel2019one; Hardt2016; grari2019fairness. However, all these methods use the sensitive during training which is not always possible.

Recently, some works addressed this complex objective to obtain a fair predictor model without the availability of the sensitive. Most of these works leverage the use of external data or prior knowledge on correlations zhao2021you; madras2018learning; schumann2019transfer; gupta2018proxy. Some other methods avoid the need of such additional data, but are only based on some local smoothness of the feature space, rather than explicitly focusing on targeted subgroups to be protected hashimoto2018fairness; lahoti2020fairness. The need of new approaches for algorithmic fairness that break away from the prevailing assumption of observing sensitive characteristics to be fair has also been recently highlighted in tomasev2021fairness.

To fill the gap, we propose a novel approach that relies on bayesian variational autoencoders (VaEs) to infer the sensitive information given a causal graph provided by expert knowledge of the data, and then uses the inferred information as proxy for mitigating biases in a adversarial fairness training setting. More specifically, the latter mitigation is performed during predictor training, by considering a fairness loss based on an estimation of the correlation between predictor outputs and the inferred sensitive proxies. We strive in this paper for showing the empirical interest of this type of method.

## Problem Statement

Throughout this document, we consider a supervised machine learning algorithm for classification problems. The training data consists of examples , where

is the feature vector with

predictors of the -th example, is its binary discrete sensitive attribute and its binary discrete outcome true value.#### Demographic Parity

A classifier is considered fair if the prediction

from features is independent from the protected attribute Dwork2011. The underlying idea is that each demographic group has the same chance for a positive outcome.###### Definition 1.

There are multiple ways to assess this objective. The p-rule assessment ensures the ratio of the positive rate for the unprivileged group is no less than a fixed threshold . The classifier is considered as totally fair when this ratio satisfies a 100%-rule. Conversely, a 0%-rule indicates a completely unfair model.

#### Equalized Odds

An algorithm is considered fair if across both demographics and , the predictor has equal false positive rates, and false negative rates Hardt2016. This constraint enforces that accuracy is equally high in all demographics since the rate of positive and negative classification is equal across the groups. The notion of fairness here is that chances of being correctly or incorrectly classified positive should be equal for every group.

###### Definition 2.

A metric to assess this objective is to measure the disparate mistreatment (DM) zafar2015fairness. It computes the absolute difference between the false positive rate (FPR) and the false negative rate (FNR) for both demographics.

The closer the values of and to 0, the lower the degree of disparate mistreatment of the classifier.

In this paper we deal with those aforementioned fairness metrics for problems where the sensitive is hidden at train time.

## Related work

From the state-of-the-art literature, one possible way to overcome the unavailability of sensitive attributes during training is to use transfer learning methods, from external sources of data where the sensitive group labels are known. For example,

madras2018learning proposed to learn fair representations via adversarial learning on a specific downstream task and transfer it to the targeted one. schumann2019transfer and coston2019fair focus on domain adaptation. mohri2019agnostic consider an agnostic federated learning where a centralized model is optimized for any possible target distribution formed by a mixture of different clients distributions. However, the actual desired bias mitigation is highly dependent on the distribution of the external source of data.Another trend of works requires prior knowledge on sensitive correlations. With prior assumptions, gupta2018proxy and zhao2021you mitigate the dependence of the predictions on the available features that are known to be likely correlated with the sensitive. However, such strongly correlated features do not always exist in the data.

Finally, some approaches address this objective without any
prior knowledge on the sensitive. First, some approaches aim at improving the accuracy for the worst-case protected group for achieving the Rawlsian Max-Min fairness objective. This implies techniques from distributionally robust optimization hashimoto2018fairness or adversarial learning lahoti2020fairness.
Other approaches such as yan2020fair act on the input of the data via a cluster-based balancing strategy. These methods have the advantage to require no additional sensitive data or knowledge, but are often ineffective for traditional group fairness definitions such as *demographic parity* and *equalized odds*. Their blind way of mitigation usually implies a strong degradation of the predictor accuracy, by acting on non-sensitive information.

Our approach is inherently different from the aforementioned approaches. Based on light prior knowledge on some causal relationships in the data, we rely on the bayesian inference of latent sensitive proxies, whose dependencies with model outputs can then be easily mitigated in a second training step.

## Methodology

We describe in this section our methodology to provide a fair machine learning model for training data without sensitive demographics. For this purpose, we first assume some specific causal graph which underlies training data. This causal graph allows us to infer, through Bayesian Inference, a latent representation containing as much information as possible about the sensitive feature. Finally, this section presents our methodology to mitigate fairness biases, while maintaining as prediction accuracy as possible for the model.

### Causal Structure of the Sensitive Retrieval Causal Variational Autoencoder: SRCVAE

Our work relies on the assumption of underlying causal graphs given in Figure 1, which fit with many real world settings (slightly different graphs are studied in appendix). In the leftmost graph, parents of the output are split in three components , and , where contains only variables not caused by the sensitive attribute . The variables subset is both caused by the sensitive information and .

For instance, in Figure 2, if we regard the assumed causal graph of the Adult UCI dataset with Gender as the sensitive attribute and Income as the expected output , is the set of variables *Race*, *Age* and *Native_Country* which do not depend on the sensitive, and corresponds to all remaining variables that are generated from and (i.e., ).

Assuming the observation of all variables except , our purpose in the following is to recover all the hidden information not caused by the set but responsible of and . In a real world scenario, it is noteworthy that the accuracy with which one can recover the real sensitive depends on the right representation of the complementary set . If the set is under-represented, the reconstruction of (whose methodology is given in the next section) may contain additional information. Assuming that the graph from Figure 2 is the exact causal graph that underlies the Adult UCI, and imagine a setting where for instance the variable Race is hidden, this variable is likely to leak in the sensitive reconstruction. We argue that reconstructing a binary sensitive feature with other leakage information strongly degrades the inferred sensitive proxy. This is what motivated us to rather consider the rightmost graph from 1 approach, that considers a multivariate continuous intermediate confounder which both causes the sensitive and the observed variables in and . As we observe in the experiments section, such a multivariate proxy also allows better generalization abilities for mitigated prediction.

### Sensitive Reconstruction

We describe in this section the first step of our SRCVAE framework for generating a latent representation which contains as much as possible information about the real sensitive feature . As discussed above, our strategy is to use Bayesian inference approximation, according to the pre-defined causal graph in Figure 1.

A simple methodology for performing inference could be to suppose with strong hypothesis a non deterministic structural model with some specific distribution for all the causal links and finally estimate the latent space by probabilistic programming language such as Stan kusner2017counterfactual; team2016rstan

. Leveraging recent developments for approximate inference with deep learning, many different works

louizos2017causal; pfohl2019counterfactual; grari2020adversarial proposed to use Variational Autoencoding kingma2013auto methods (VAE) for modeling exogenous variables in causal graphs. It has shown successful empirical results and in particular for the counterfactual fairness sub-field. The counterfactual objective is different from our objective since the inference is performed for generating some exogenous variables independently from the sensitive. It aims at building prediction models which ensure fairness at the most individual level, by requiring the notion of the operator pearl2009causal with the intervention of the sensitive in causality ( or ). Notice here, we don’t require this notion since we are only interested to capture a stochastic exogenous variables by generating a sensitive proxy distribution for each individual.Following the rightmost causal graph from Figure 1, the decoder distribution can be factorized as below:

Given an approximate posterior , we obtain the following variational lower bound:

(1) |

where

denotes the Kullback-Leibler divergence of the posterior

from a prior, typically a standard Gaussian distribution

. The posterioris represented by a deep neural network with parameters

, which typically outputs the meanand the variance

of a diagonal Gaussian distribution .The likelihood term which factorizes as , is defined as neural networks with parameters

. Since attracted by a standard prior, the posterior is supposed to remove probability mass for any features of

that are not involved in the reconstruction of and . Since is given together with as input of the likelihoods, all the information from should be removed from the posterior distribution of .We employ in this paper a variant of the ELBO optimization as done in pfohl2019counterfactual, where the term is replaced by a MMD term between the aggregated posterior and the prior. This has been shown more powerful than the classical for ELBO optimization in zhao2017infovae, as the latter can reveal as too restrictive (uninformative latent code problem) chen2016variational; bowman2015generating; sonderby2016ladder and can also tend to overfit the data (Variance Over-estimation in Feature Space).

This inference must however ensure that no dependence is created between and (no arrow from to in the rightmost graph 1, to prevent the generation of proper sensitive proxy which is not linked to the complementary. However, by optimizing this ELBO equation, some dependence can still be observed empirically between and as we show through our experimental results part. Some information from leaks in the inferred

. In order to ensure some minimum independence level we add a penalisation term in this loss function. Leveraging recent research for mitigating the dependence between continuous variables, we extend the main idea of

grari2020learning; grari2019fairness by adapting this penalization in the variational autoencoder case. Following the idea of grari2020learning, we consider the HGR coefficient as defined in definition Sensitive Reconstruction to ensure the independence level.###### Definition 3.

For two jointly distributed random variables

and , the Hirschfeld-Gebelein-Rényi maximal correlation is defined as:(2) | ||||

(3) |

where is the Pearson linear correlation coefficient with some measurable functions and with positive and finite variance.

The HGR coefficient is equal to if the two random variables are independent, and is equal to 1 if they are strictly dependent. The HGR estimation is performed via two inter-connected neural networks by approximating the optimal transformation functions and from (Sensitive Reconstruction) as grari2019fairness; grari2020learning.

In the following, we denote as the neural estimation of HGR computed as:

where and represent the respective distributions of and . The neural network with parameters takes as input the variable and the neural network with parameters takes as input . At each iteration, the estimation algorithm first standardizes the output scores of networks and to ensure 0 mean and a variance of 1 on the batch. Then it computes the objective function to maximize the HGR estimated score.

Finally, the inference of our SRCVAE approach is optimized by a mini-max game as follows:

where ,

are scalar hyperparameters. The additional MMD objective can be interpreted as minimizing the distance between all moments of each aggregated latent code distribution and the prior distribution. Note that the use of

as input for our generic inference scheme is allowed since is only used during training for learning a fair predictive model in the next section and is not used at deployment time.In figure 5, we represent the min-max structure of SRCVAE. The left architecture represents the max phase where the HGR between and is estimated by gradient ascent with multiple iterations. The right graph represents the min phase where the reconstruction of and is performed by the decoder (red frame) via the generated latent space from the decoder . The adversarial HGR component (blue frame) ensures the independence level between the generated latent space and . The adversarial takes as input the set and the adversarial takes the continuous latent space . In that way, we capture for each gradient iteration the estimated HGR between the set and the generated proxy latent space . At the end of each iteration, the algorithm updates the parameters of the decoder parameters as well as the encoder parameters

by one step of gradient descent. Concerning the HGR adversary, the backpropagation of the parameters

and is performed by multiple steps of gradient ascent. This allows us to optimize a more accurate estimation of the HGR at each step, leading to a greatly more stable learning process. The hyperparameter controls the impact of the dependence loss in the optimization.### Mitigate the unwanted biases

Once a sensitive proxy is available from the inference method of the previous section, the goal is now to use it for training a fair predictive function . Since contains some continuous multidimensional information, we adopt an HGR-based approach inspired from grari2019fairness; grari2020learning which have shown superior performance in this context. We also observe this claim empirically in our context, results are shown in appendix.

Depending on the fairness objectives, we propose to mitigate the unwanted bias via an adversarial penalization during the training phase.

#### Demographic Parity

We propose to find a mapping which both minimizes the deviation with the expected target and does not imply too much dependency with the representation . This information proxy is generated as mentioned above from the posterior distribution . This enables to generate a sensitive distribution proxy for each individual where the dependence with the output prediction is assessed and mitigated all along of the training phase. In this paper, we extend the idea of grari2019fairness by proposing a novel neural HGR-based cost for fairness without demographics via inference generation. We propose the optimization problem as follows:

where is the predictor loss function (the log-loss function in our experiments) between the output and the corresponding target , with a neural network with parameters which takes as input the set and the descendant attribute . The hyperparameter controls the impact of the dependence between the output prediction and the sensitive proxy . For each observation , we generate (200 in our experiment) different latent variables (-ith generation) from the causal model. As in the inference phase, the backpropagation of the HGR adversary with parameters and is performed by multiple steps of gradient ascent. This allows us to optimize an accurate estimation of the HGR at each step, leading to a greatly more stable predictive learning process.

#### Practice in real-world

As mentioned in the first subsection, the assumed causal graph 1 requires the right representation of the complementary set . If the set is under-represented, some specific hidden attributes can be integrated with the sensitive in the inferred sensitive latent space . The following Theorem 1 allows us to ensure that mitigating the HGR between and implies some upperbound for the targeted objective.

###### Theorem 1.

For two nonempty index sets and such that and the output prediction of a predictor model, we have:

(4) |

###### Proof.

in appendix

#### Equalized odds

We extend the demographic parity optimization to the equalized odds task. The objective is to find a mapping which both minimizes the deviation with the expected target and does not imply too much dependency with the representation conditioned on the actual outcome . For the decomposition of disparate mistreatment, the mitigation shall be divided into the two different values of . By identifying and mitigating the specific non linear dependence for these two subgroups, it enforces the two objectives of having the same false positive rate and the same false negative rate for each demographic.

The mitigation of this optimization problem is as follows:

Where corresponds to the observations set where and to observations where . The hyperparameters and control the impact of the dependence loss for the false positive and the false negative objective respectively.
The first penalisation (controlled by ) enforces the independence between the output prediction and the sensitive proxy only for the cases where . It enforces the mitigation of the difference of false positive rates between demographics, since at optimum for with no trade-off (i.e., with infinite ) and , and implies theoretically: . For the second, it enforces the mitigation of the difference of the true positive rate since the dependence loss is performed between the output prediction and the sensitive only for cases where . In consequence, is mitigated (since FNR=1-TPR). Note that the use of as input for our generic mitigation to create a fair *equalized odds* classifier predictor is allowed since this mitigation is only used during training and not at deployment time.
In our experiment

## Experimental results

For our experiments, we empirically evaluate the performance of our contribution on real-world data sets where the sensitive is available. This allows to assess the fairness of the output prediction, obtained without the use of the sensitive, w.r.t. this ground truth. To do so, we use the popular Adult UCI and Default datasets (descriptions in Appendix) often used in fair classification.

For understanding the interest of mitigating the dependence between the latent space and the complementary set during the inference phase, we plot in Figure 8 the t-SNE of of two different inference models for Adult UCI dataset. As a baseline, we consider a version of our model trained without the penalization term (). This is compared to a version trained with a penalization term equal to . As expected, training the inference model without the penalization term, results in a poor reconstruction proxy , where the dependence with is observed. We can observe that the separation is not significant between the data points of men (blue points) and women (red points). We also observe that increasing this hyper-parameter allows to decrease the estimation from to and greatly increase the separation between male and female data points.

The dynamics of adversarial training for demographic parity performed for Adult UCI is illustrated in Figure 12 with an unfair model () and a fair model with an hyperparameters (results for other values are presented in appendix). We represent the accuracy of the model (top), the P-rule metric between the prediction and the real sensitive (middle), and the HGR between the prediction and the latent space (bottom). We observe for the unfair model (leftmost graph) that the convergence is stable and achieve a P-rule to . As desired, by incresing the hyperparameter the penalization loss decreases (measured with the ), it allows to increase the fairness metric P-rule to with a light drops of accuracy.

In Figure 9 we plot the distribution of outcomes depending on the sensitive value for 3 models for the demographic parity criterion and three different values of fairness weight : , and . For the leftmost graph (i.e. ) the model looks very unfair, since the distribution importantly differs between sensitive groups. As desired, we observe that with increased values, the distributions are more aligned.

##### Comparison against the State-of-the-Art

For the two datasets, we experiment different models where, for each, we repeat five runs by randomly sampling two subsets, 80% for the training set and 20% for the test set. Because different optimization objectives result in different algorithms, we run separate experiments for the two fairness objectives of our interest. As an optimal baseline to be reached, we consider the approach from adel2019one using observations of the sensitive during training, which we denote as *True S*. We also compare various approaches specifically designed to be trained in the absence of the sensitive information during the training: *FairRF* (zhao2021you),
*FairBalance* (yan2020fair),
*ProxyFairness* (gupta2018proxy) and *ARL* (lahoti2020fairness), where the latter is only compared for the equalized odds task (i.e. discussion in zhao2021you).
We plot the performance of these different approaches by displaying the Accuracy against the P-rule for Demographic Parity (Figure 15) and the Disparate Mistreatment (DM) (corresponding to the sum of and ) for Equalized Odds (Figure 18).
We clearly observe for all algorithms that the Accuracy, or predictive performance, decreases when fairness increases. As expected, the baseline *True S* achieves the best performance for all the scenarios with the highest accuracy and fairness. We note that, for all levels of fairness (controlled by the mitigation weight in every approach), our method outperforms state-of-the-art algorithms for both fairness tasks (except some points for very low levels of fairness, at the left of the curves). We attribute this to the ability of SRCVAE for extracting a useful sensitive proxy, while the approaches *FairRF* and *ProxyFairness* seem to greatly suffer from only considering features present in the data for mitigating fairness. The approach *FairBalance*, which pre-processed the data with clustering, seems inefficient and degrades the predictive performance too significantly.

##### Impact of proxy dimension

In figure 21, we performs additional experiment about the sensitive proxy. We observe that using a single dimension of is less efficient than larger dimension, for the two datasets. Increasing the dimensions to and for Adult UCI and Default respectively allows to obtain better results in terms of accuracy and this for all levels of P-rule. We claim that, as already observed in another context in grari2020learning, mitigating biases in larger spaces allow better generalisation abilities at test time, which supports the choice of considering a multivariate sensitive proxy , rather than directly acting on a reconstruction of as a univariate variable.

## Conclusion and Future Work

We present a new way for mitigating undesired bias without the availability of the sensitive demographic information in training. By assuming a causal graph, we rely on a new variational auto-encoding based framework named SRCVAE for generating a latent representation which is expected to contain the most sensitive information as possible. These inferred proxies serve, in a second phase, for bias mitigation in an adversarial fairness training of a prediction model. Compared with other state-of-the-art algorithms, our method proves to be more efficient in terms of accuracy for similar levels of fairness. For further investigation, we are interested in extending this work to settings where the actual sensitive can be continuous (e.g. age or weight attribute) and/or multivariate.

Comments

There are no comments yet.