## 1 Introduction

Recent studies have brought to attention the problem of bias in machine learning algorithms. [20] showed that Google Image search results for occupations are more gender-biased compared to the ground truth and that this affects the perspective of women in these occupations. [14, 3] argued that classification algorithm to predict criminal recidivism can be race-biased due to biased data and algorithms. Even in the context of online advertising, [9] demonstrated that women are less likely to be shown advertisements of high-paying jobs. With such an important gap in the design of these algorithms, it is important to study and provide models for fair classifiers for all kinds of datasets and settings.

We consider the paradigm of adversarial learning to design fair classifiers. The goal of adversarial learning is to pit the training algorithm against an adversary which tries to determine whether the trained model is “robust” enough or not. Popularized by Generative Adversarial Networks (GANs) [17], which are used generate fake samples from the unknown training distribution, the adversary’s job is to decide whether the sample generated is real or fake and give its feedback to the generator, which then uses the feedback to improve the model. In applications such as fair classification, an adversary can be introduced to check whether the trained classifier is fair or not. If the model is deemed to be not fair in terms of the chosen metric, the training algorithm uses the feedback from the adversary to modify the model and the process is repeated.

### 1.1 Our Contributions:

While adversarial fairness has been explored in other papers such as [34, 28], we theoretically and experimentally analyze the current suggested models and suggest better performing algorithms and models. We employ a fairness-metric specific model for the adversary (Section 3) and show that it performs better than [34] and other related work for real-world and adversarial datasets (Section 5).

The modified gradient update algorithm we use is similar to the work of [34], but we suggest certain variations to improve the performance, for example, we employ the usa of Accelerated Gradient Descent for noisy gradient oracles [6], which results in a more efficient implementation (Section 4). We also discuss, theoretically and empirically, the difference between the normal and modified gradient update and correspondingly motivate the use of the modified update (Section 6.1).

We present a general theoretical analysis for the convergence of the Normal Gradient Descent and the Accelerated Gradient Descent with modified gradient updates for multi-objective optimization and give a quantification of the price of fairness incurred when perfect fairness is to be ensured (Section 6.2). Finally, we design and implement a model to ensure false discovery parity and show that the adversarial model can be extended to other fairness metrics; this is presented in Section 7.

### 1.2 Notation

Let ^{1}^{1}1. be the training set of samples, where

is the feature vector of the

-th element, is the sensitive attribute of the -th element and is it’s class label. Let denote the samples with sensitive attribute value . We assume that the sensitive attribute and the class label are binary, i.e., for all .The goal is to design a classifier, denoted by . Let denote the classification loss (exact expression to be specified later) and

denote the fairness adversary loss. We will often use logistic regression as the classifier with

log-loss as the cost function. For a classifier , it is defined as the following.We will also use

to denote the sigmoid function, i.e.,

and will be used to denote the projection of vector on , i.e.,For theoretical analysis, we may need to assume certain properties on the loss function; in particular, the smoothness property. A function

is -Lipschitz smooth if for any and ,To compare the correlation between vectors, we will use the Pearson correlation coefficient [27] which, for two vectors , is defined as

where are the means of vectors respectively.

### 1.3 Fairness Metrics

The fairness goal to be satisfied will depend on the fairness metric used. We will work with two fairness metrics in this document, statistical parity and false discovery rate, but with appropriate changes, the algorithm and the model can be used for other metrics as well.

###### Definition 1.1 (Statistical Parity).

Given a dataset , a classifier satisfies statistical parity if

i.e., the probability of positive classification is same for all values of sensitive attribute.

Statistical parity has also been called demographic parity or disparate impact in many related works [32]. The above condition of parity can also be relaxed to ensure better fairness in certain cases. Correspondingly, we also design our algorithm to take the statistical rate as input. It is defined as follows.

###### Definition 1.2 (Statistical Rate).

Given a dataset and , a classifier has statistical rate if

Similarly, false discovery parity is satisfied when the probability of error given positive classification is equal for all sensitive attribute values. Formally, false discovery rate is defined as the following.

###### Definition 1.3 (False Discovery Rate).

Given a dataset and , a classifier has false discovery rate if

There are many other fairness metrics that have been considered before, for example, false positive rate, false negative rate, true positive rate, true negative rate, false omission rate, equalized odds, etc. The goal of this paper is not to provide a meta-classifier, but rather to understand how to ensure fairness using adversarial programs. However, using the framework described in Section

3, one can design adversarial programs to ensure different fairness parities, as we demonstrate with statistical rate and false discovery rate. The reason for choosing false discovery rate over other similar metrics like false negative rate is that the probability is conditioned over the classification label. Such metrics are useful in cases when false prediction results in additional costs, for example, when classifying whether a person has a medical condition or not. False discovery rate has been considered in relatively few earlier works [5] and we show that our framework can be used to ensure false discovery parity.## 2 Related Work

The idea of adversarial machine learning was popularized by the introduction of Generative Adversarial Networks (GANs)

[17]. Based on similar ideas, multiple learning algorithms have been suggested to generate fair classifiers using adversaries.As mentioned earlier, [34] proposed a model to learn a fair classifier based on the idea of adversarial debiasing. Their algorithm also uses the gradient descent with the modified update but does not include any theoretical guarantees on convergence, and experimentally, they suggest a model to just ensure equalized odds for the Adult dataset. [28] use a similar adversarial model for COMPAS dataset [2], but do not use the modified update for optimization.

A few other papers use adversarial settings to tackle the problem in a different manner. [23] learn a “fair” latent representation of the input data and then design a classifier using the representation. The adversary’s job here is to ensure that the representation generated is fair. On the other hand, [30] use the GAN framework to generate fair synthetic data from the original training data, which can then be used to train a classifier. To do so, they add another discriminator to the network which checks for fairness. The work of [30] is more comparable to the pre-processing algorithm given by [19, 13, 22, 4].

Many other models for fair classification have been proposed. They differ either in the formulation of the problem or the fairness metric considered. We try to summarize the major results below.

While [19, 13, 22, 4] gave pre-processing algorithms to ensure that the data is fair, most other fair classification algorithms suggest a new optimization problem or use post-processing techniques. [5, 24, 7] formulate the problem as Bayesian classification problem with fairness constraints and suggest methods to reduce it to an unconstrained optimization problem. [32, 31] suggest a covariance-based constraint which they argue can be used to ensure statistical parity or equalized odds. To deal with the issue of multiple fairness metrics, [5, 1, 26] give a unified in-processing framework to ensure fairness with respect to different metrics. Most of these algorithms can be seen as formulating a regularizer function to ensure fairness and as discussed earlier, such an algorithm may not be able to ensure fairness when the dataset is adversarial. Also, to the best of our knowledge, only [5, 26] provide classifiers that can ensure false discovery parity.

The work of [1] is perhaps the closest in terms of the techniques involved. They formulate their constrained optimization problem as an unconstrained one using Lagrangian transformation. This leads to a min-max optimization problem, which they then solve using the saddle point methods of [15, 21]. The key difference with respect to our work is that they do not aim to learn the sensitive attribute information from the classifier and instead just use the regularizer. Furthermore, their formulation does not support non-linear metrics like false discovery rate.

To ensure fairness by post-processing, [18] gave a simple algorithm to find the appropriate threshold for a trained classifier to ensure equalized odds. Similarly, [16, 25, 29] suggest different ways of fixing a decision boundary for different values of the sensitive attribute to ensure that the final classifier is fair.

## 3 Model

Let denote the classification loss and let denote the adversary loss. The adversary, given parameters of the classifier, uses extra parameters, say , to deduce the sensitive attribute information from the classifier. The job is to find the classifier which minimizes , while the adversary tries to maximize . Formally, we want to find the parameters such that

In practice, however, we cannot always hope that the best classifier always satisfies the fairness constraints. Correspondingly, we aim to find a model satisfying the following definition.

###### Definition 3.1 (Solution Characterization).

Given classification loss and adversary loss , for , a model with parameters is an -solution if

Consider the example of logistic regression for classifier and adversary, i.e., and

where is the predictions from the classifier. The classifier here tries to correctly predict the class label, while the adversary tries to deduce the sensitive attribute information from the classifier output. This model is similar to the one considered in [34] for the Adult dataset.

For this example, the functions and do not have a unique optimizer unless the feature matrix is full-rank. Since in general we can expect that the number of samples is greater than the dimension of vectors , there will be multiple optimizers for the above loss function and correspondingly there will be multiple -solutions. The same argument holds for any model which uses thresholding or sigmoid-like functions.

### 3.1 Model used for Classifier and Fairness Adversary - Statistical Parity

In this section, we define the model we use for classification and adversary, with the goal of ensuring statistical parity. We also provide the model and results when the fairness metric is false discovery rate in a later section.

#### 3.1.1 Classifier

The model considered is the regularized logistic regression model. For a given weight vector , where is the sigmoid functions.

Similar to the structure of Generative Adversarial Networks, we wither add some noise to input of the classifier so as to make it partially randomized or add an additional vector with value 1 to act as the bias. Correspondingly, , where is either uniformly chosen from or is fixed to be 1.

###### Remark 3.1 (Reason for adding noise or 1s vector).

If the sensitive attribute and the class label are highly correlated, the only way for the classifier to satisfy statistical parity is to output a random class label for each data-point. In this scenario, the algorithm can make much larger than other weights ensuring that the output classifier is random. We investigate this empirically in Section 5.5.1. As expected, as the dataset becomes more adversarial, the weight given to the final element increases.

#### 3.1.2 Classification Loss

The corresponding classification loss function is The loss function is the standard one for regularized logistic regression.

#### 3.1.3 Fairness Adversary

Since we want to ensure statistical parity, we look at how well we can predict the sensitive variable using the classifier output. Correspondingly, the fairness adversary will be a classifier , where for a particular and , is the polynomial expansion of of degree , i.e., . In the rest of the document, we will use unless explicitly specified.

#### 3.1.4 Fairness Adversary Loss

The fairness adversary loss function is

where is the probability that the sensitive attribute is in the training set. The first part of the loss function corresponds to learning the correlation between the sensitive attribute and the class label. The second part of the loss function is a regularizer to check whether fairness is satisfied. Maximizing the adversary loss would ensure that can be predicted using and that the statistical rate is low.

The reason for choosing such a regularizer is that intuitively, statistical parity will be ensured if is equally distributed across all groups in the dataset. However, since both the sensitive attribute values may not be present equally in the training set, we divide the elements by .

###### Remark 3.2 (Choice of adversary).

The adversary chosen here is different than the one suggested in [34]. There they design the adversary to predict the sensitive attribute from the classifier output. However, this will not ensure fairness when the sensitive attribute and class label are highly correlated, or when the dataset has very few elements for a particular sensitive attribute value. The model we suggest tries to learn the correlation between sensitive attribute and class label and uses the fairness metric as a regularizer term.

## 4 Algorithms

To solve the min-max problem, we can use an alternating gradient ascent-descent algorithm, which simultaneously aims to minimize and maximize . We list below the algorithms we use for our experiments and analysis.

### 4.1 Gradient Descent/Ascent with Normal Update

Using a normal gradient descent/ascent algorithm would imply moving in the direction to minimize and to maximize . Combining the two directions with a controlling parameter , we get the algorithm with the following updates at each step.

for some .

### 4.2 Algorithm 1 - Normal GD with Modified Update

In certain cases where the gradient of the fairness loss and the gradient of the classification loss are highly correlated, Algorithm 4.1 will not be able to ensure both fairness and accuracy. We provide examples and theoretical analysis of such cases in Section 6.1.

However, with a simple modification we can ensure that even if gradient of the fairness loss and the gradient of the classification loss are highly correlated, the output classifier is fair. To that end, we consider the modified update step, where we remove the projection of from from the update. With an appropriate starting point, at each iteration we use the following update steps.

for some and . Though the modified update step we use is inspired by the work of [34], the models and analysis we use are quite different from their work. Thresholding: Since the problem is a multi-objective optimization problem, we cannot expect to converge to the optimal point (or close) after iterations. Therefore, instead of running the algorithm for a large number of iterations, we use a thresholding mechanism. Given a threshold , during training we will record the parameters for which the training statistical rate and the training accuracy is maximum. The threshold can be taken as an input and can be considered a way for the user to control the fairness of the output classifier.

### 4.3 Algorithm 2 - AGD with Modified Update

We modify the earlier alternating gradient ascent-descent algorithm to get an accelerated algorithm. The accelerated algorithm we use is inspired by the work of [6], where they provide an improved Accelerated Gradient Descent method that works for noisy gradient oracles. Similar accelerated methods of smooth optimization for different kinds of inexact oracles were also given by [8, 10].

Assume that is -smooth and is -smooth. With an appropriate starting point, at each iteration we use the following update steps.

for some . The regularizer function is a strongly convex function. We choose

for our analysis and experiments. and the numbers are chosen as

Thresholding: Similar to Algorithm 4.2, we can use the thresholding mechanism here as well, given input .

## 5 Empirical Results

We evaluate the performance of this method empirically, and report the classification accuracy and fairness on both a real-world dataset, and on adversarially constructed synthetic datasets. We compare different update methods, and also contrast against state-of-the-art algorithms.

### 5.1 Datasets

We conduct our experiments on the Adult income dataset [11]. This dataset has the demographic information of approximately 45,000 individuals, and class labels that determine whether their income is greater than $50,000 USD or not. For the purposes of our simulations, we consider the sensitive attribute of gender, which is coded as binary in the dataset.

Additionally, we construct adversarial synthetic datasets from the Adult dataset in order to show the wide applicability of the model. The method of constructing the dataset is related to the choice of adversary, since we want to show that our algorithm performs well even for adversarial datasets. As we noted before, the algorithm with normal gradient updates (Section 4.1) can perform poorly when the sensitive attribute and the class label are highly correlated (Section 6.1). Hence, for a given value of the Pearson correlation coefficient, we generate a synthetic dataset, with feature vectors of the Adult dataset, where the class labels are modified to ensure that correlation coefficient between the class label and the sensitive attribute is the given value. We generate multiple such datasets for varying correlation coefficients and test our algorithm and other state-of-the-art algorithms on these datasets.

### 5.2 Performance as a Function of Time

We first look at the training performance of Algorithm 4.3 and Algorithm 4.2 on the Adult and synthetic datasets.

#### 5.2.1 Adult Dataset:

For these plots, the parameters chosen for the algorithm are: learning rate, , and number of iterations . The training accuracy and training statistical rate as presented in the first plot of Figure (a)a. As can be seen from the figure, for the Adult dataset the algorithm eventually converges to a point of high accuracy and high statistical rate using Algo 4.2. When using Algo 4.3, we get similar high accuracy, but Algo 4.3 converges to the point of high accuracy and high fairness faster. However, the final statistical rate obtained using Algorithm 4.3 is better.

#### 5.2.2 Synthetic Dataset:

The Pearson correlation between the class label and the sensitive attribute for the synthetic dataset used here is 0.5. Once again, the parameters chosen for the algorithm are: learning rate, , and number of iterations . The training accuracy and training statistical rate as presented in Figure (b)b. As can be seen from the figure, for the synthetic dataset both algorithms eventually converges to a point of high accuracy. Once again, Algo 4.3 converges faster than Algo 4.2.

### 5.3 Modified vs. Normal Update Steps

We compare the modified and normal gradient update steps. Figure 1 shows how the accuracy and statistical rate varies for both the algorithms as the correlation between the class label and sensitive attribute changes. We run the AGD Algorithm 4.3 in both settings, with and without threshold the threshold parameter . It is clear from the figures that the modified gradient update algorithms achieve much better fairness than the normal gradient update algorithms.

To further show the importance of removing the projection, we also show plots of the statistical rate across the training iterations. Figure 2 shows the plots for that setting. From Figure (a)a and Figure (a)a, it is clear that for both Adult and synthetic datasets, the algorithm with modified update performs much better than the algorithm with normal update. When using AGD, the statistical rate is never high for normal gradient update algorithm. While for Normal Gradient Descent, the statistical rate is high for normal gradient update algorithm during the initial iterations, it drops very quickly and does not allow the accuracy to be high along with high statistical rate. Finally, Figure (c)c shows that for all datasets with varying correlation coefficients, the fairness at timestep 50 of training is never high when using the gradient descent algorithm with normal updates.

### 5.4 Comparison Against the State of the Art

We now compare the performance of our proposed methods against the state-of-the-art fair classification techniques. We vary the correlation in the synthetic dataset and report the test accuracy, test statistical rate. The synthetic datasets are constructed as discussed in Section

5.1. The threshold is set to 0.9. We compare our algorithm with the fair classifiers of [32]^{2}

^{2}2github.com/mbilalzafar/fair-classification, [34]

^{3}

^{3}3 github.com/IBM/AIF360

^{4}

^{4}4Note that NGD with modified update can be considered the [34] implementation for our model and we already showed that AGD with modified update converges faster than it. and [5]

^{3}, and present the results in Figure 3.

We observe that using Accelerated Gradient Descent with threshold 0.9, we get higher or comparable fairness (statistical rate) for all datasets, albeit with a small loss to accuracy. In particular, when the statistical rate obtained by our algorithm is smaller (for example, when correlation ), the accuracy of the classifier is higher. Furthermore, by increasing the threshold parameter, our algorithm can always be forced to achieve higher fairness; indeed, from Figure 2, we know that our algorithm can achieve perfect statistical parity during the training process for all synthetic datasets.

We also construct an adversarial model to ensure high false discovery rate of the output classifier. The model and the empirical comparison with other algorithms are presented in Section 7. The empirical results show that our model achieves higher false discovery rate than other algorithms, while the accuracy is comparable for most datasets and slightly lower in other datasets.

### 5.5 Other Experiments

#### 5.5.1 Importance of noise

As mentioned earlier, we add a noise or 1s element to the feature vector of each datapoint. As correlation between the class label and the sensitive attribute increases, the only way to ensure high statistical parity is to make the classifier either random or output all 1s. Correspondingly, we hope that the adversary feedback pushes the classifier to make the weight given to the noise element larger.

In this section, we measure this observation. The threshold is set to be 0.8. We measure the ration between the weight given to the noise element and the maximum weight given to any other element in the feature vector, i.e.,

where is the weight of the noise element and is the weight vector without the noise element. The plot of the ratio against the the correlation in the synthetic dataset is presented in Figure 4. As we can see from the figure, the weight of the noise element increases as the correlation increases, which is the expected behaviour.

#### 5.5.2 Changing parameter

Recall that in the modified gradient update step, we use

In this section, we look at the affect of change of on the accuracy and statistical parity for both Adult dataset and Synthetic dataset. Ideally is chosen as a decaying function of iteration . We set it as

The results are presented in Figure 5. As we can see, the test statistical rate and accuracy do not really change with for both algorithms.

## 6 Theoretical Results

### 6.1 Update Step Without the Projection Term

We look at the gradient descent algorithm with normal gradient updates (Algorithm 4.1) and provide counter-examples for the case when it may not be able to ensure fairness. Algorithm 4.1 uses the following updates in each iteration (assuming )

Assuming that the adversarial loss function is concave,

For , if then the gradient update step leads away from the optimal point. A simple example of when this can happen is the following: suppose that is a simple logistic regression log-loss function and is a regularizer function controlling statistical parity, i.e., and

The adversary here is not a classification/learning problem and is instead a simple regularizer function. Suppose the dataset given is skewed in such a way that most of the datapoints with sensitive attribute

have class label as well. In this case to achieve high accuracy, statistical parity has to be low. Therefore, will be large and will be zero. In such a case, will reduce for all iterations and fairness will never be ensured.However, this will not happen if we remove the projection of from , since then

###### Remark 6.1.

Note that the above scenario can happen for any kind of classification model where the direction of the gradient loss function and the direction of the gradient of regularizer are highly correlated for the given dataset.

### 6.2 The Modified Update Step

We analyze the modified gradient update step (with the projection term) in the context of multi-objective optimization.

#### 6.2.1 Analysis of Algorithm 4.2

We first look at the normal gradient descent/ascent method using the modified update step. The theorems in this section quantify the number of iterations required to ensure fairness and the classification loss achieved after those many iterations. We will assume that the gradients satisfy the following assumption on their norm-values.

###### Definition 6.1 (Bounded Gradient).

There exists such that for all ,

###### Theorem 6.2 (Ensuring Fairness).

The above theorem gives us a convergence bound for .

###### Proof.

Since is convex and -smooth, at the -th timestep,

Using

we get,

Let . Then using the above inequality,

Also, by the concavity of , we get,

Therefore,

and

We can analyze the time-continuous version of the above equation. It leads to following differential equation.

We use , where and let . Also Then we get,

Since we want to be small, we look at the iterations required to reduce it to .

Using the above number of iterations, we can quantify how far is from .

###### Theorem 6.3 (Price of Fairness).

Let also be an -smooth and -gradient bounded function and assume that the and satisfy Assumption 6.1. Let , for , in Algorithm 4.2. Let

and . The algorithm uses the learning rates

Let the number of iterations be the same as obtained in Thm 6.2. Suppose that the normal gradient descent method, using gradient updates , is -close to the minimizer of after iterations, where Then after iterations, we will have

The above theorem gives us an estimation of the

price of fairness incurred by our algorithm, i.e., to achieve perfect fairness through the above model, the least amount of classification loss we have to sacrifice, compared to the minimum classification loss.###### Proof.

We obtain the price of fairness bound by analyzing the convergence of .

Let . Then using the above inequality,

Also, by the concavity of , we get,

Therefore,

and

Note that we want to be smaller than for all . Infact we want to find the number of iterations in which it reduces to small . Correspondingly, we can assume that

Therefore,

Since is -gradient bounded, we get

Once again, we formulate it as a differential equation and solve it. Note that

Substituting the value of , we get

Since

we get

Substituting the value of from the previous theorem, we get

Comments

There are no comments yet.