You Can Still Achieve Fairness Without Sensitive Attributes: Exploring Biases in Non-Sensitive Features

Though machine learning models are achieving great success, ex-tensive studies have exposed their disadvantage of inheriting latent discrimination and societal bias from the training data, which hinders their adoption on high-state applications. Thus, many efforts have been taken for developing fair machine learning models. Most of them require that sensitive attributes are available during training to learn fair models. However, in many real-world applications, it is usually infeasible to obtain the sensitive attribute due to privacy or legal issues, which challenges existing fair classifiers. Though the sensitive attribute of each data sample is unknown, we observe that there are usually some non-sensitive features in the training data that are highly correlated with sensitive attributes, which can be used to alleviate the bias. Therefore, in this paper, we study a novel problem of exploring features that are highly correlated with sensitive attributes for learning fair and accurate classifier without sensitive attributes. We theoretically show that by minimizing the correlation between these related features and model prediction, we can learn a fair classifier. Based on this motivation, we propose a novel framework which simultaneously uses these related features for accurate prediction and regularizing the model to be fair. In addition, the model can dynamically adjust the importance weight of each related feature to balance the contribution of the feature on model classification and fairness. Experimental results on real-world datasets demonstrate the effectiveness of the proposed model for learning fair models with high classification accuracy.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning Fair Models without Sensitive Attributes: A Generative Approach

Most existing fair classifiers rely on sensitive attributes to achieve f...

Two-stage Algorithm for Fairness-aware Machine Learning

Algorithmic decision making process now affects many aspects of our live...

Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information

Training and evaluation of fair classifiers is a challenging problem. Th...

Active Fairness Instead of Unawareness

The possible risk that AI systems could promote discrimination by reprod...

Noise-tolerant fair classification

Fair machine learning concerns the analysis and design of learning algor...

Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns

As machine learning is increasingly used to make real-world decisions, r...

Fair Regression under Sample Selection Bias

Recent research on fair regression focused on developing new fairness no...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the great improvement in performance, modern machine learning models are becoming increasingly popular and are widely used in decision-making systems such as medical diagnosis (Bakator and Radosav, 2018) and credit scoring (Dastile et al., 2020). Despite their great success, extensive studies (Gianfrancesco et al., 2018; Mehrabi et al., 2019; Yapo and Weiss, 2018) have revealed that training data may include patterns of previous discrimination and societal bias. Machine learning models trained on such data can inherit the bias on sensitive attributes such as ages, genders, skin color, and regions (Beutel et al., 2017; Dwork et al., 2012; Hardt et al., 2016). For example, a study found strong unfairness exists in a Criminal Prediction system, which is used to assess a criminal defendant’s likelihood of becoming a recidivist (Julia Angwin and Kirchner, 2016). The system shows a strong bias towards people with color, tending to predict them as recidivist even when they are not. Thus, hidden biases in a machine learning model could cause severe fairness problems, which raises concerns on their real-world application, especially in high-stake scenarios.

Therefore, various efforts (Feldman et al., 2015; Kamiran and Calders, 2009; Sattigeri et al., 2019; Zafar et al., 2015) have been taken to address the fairness issue of current machine learning models. For example, previous researches (Verma and Rubin, 2018; Saxena et al., 2019) examine model’s performance on protected groups formed according to sensitive attributes, discuss different fairness criteria and present their formalized notions.  (Zhang et al., 2017; Kamiran and Calders, 2009) seek to remove biases in training data via pre-processing and  (Hardt et al., 2016; Pleiss et al., 2017) post-process trained model to remove unfairness.  (Dwork et al., 2012; Zafar et al., 2015) proposes new optimization goals and regularization terms to remove discrimination of the model. Despite their superior performance, all the aforementioned approaches require that sensitive attributes are available for removing bias. However, for many real-world applications, it is difficult to obtain sensitive attributes of each data sample due to various reasons such as privacy and legal issues, or difficulties in data collection (Coston et al., 2019; Lahoti et al., 2020).

Tackling fairness issue without sensitive attributes available is challenging, as the distribution of different protected groups is unknown. There are only very few initial efforts on learning fair classifiers without sensitive attributes (Lahoti et al., 2020; Yan et al., 2020; Coston et al., 2019). Yan et al. (Yan et al., 2020) use a clustering algorithm to form pseudo groups, and adopt them in the substitute of real protected groups. Lahoti et al. (Lahoti et al., 2020) propose to use an auxiliary module to find computationally-identifiable regions where model under-performs, and optimize this worst-case performance. However, these works are often found to be ineffective in achieving fairness with demographics (Lahoti et al., 2020). In addition, the groups or regions found in these approaches may not be related to the sensitive attribute we want to be fair with. For example, we might want the model to be fair on gender; while the clustering algorithm gives groups of race. Thus, more efforts need to be taken to address the important and challenging problem of learning fair models without sensitive attributes.

Though the sensitive attribute of each data sample is unknown, we observe that there are usually some non-sensitive features in the training data that are highly correlated with sensitive attributes, which can be used to alleviate the bias. Previous works (Julia Angwin and Kirchner, 2016; Coston et al., 2019) have shown that though sensitive attributes are not used as input features, the trained model could still be unfair towards certain protected groups, which is because biases are embedded in some non-sensitive features used for training models. These non-sensitive features are highly correlated with sensitive attributes, which leads the model to be biased. We call such features as Related Features. Related features are highly correlated with the sensitive one due to various reasons, such as biases in data collection, or interplay of an underlying physiological difference with socially determined role perception (Celentano et al., 1990). For example, Vogel and Porter (Vogel and Porter, 2016) find that there exist striking differences in age distributions across racial/ethnic groups in US prisons. The Hispanic and black populations have a larger portion of individuals at younger ages, hence age is correlated with race in this field. In practice, common sense, domain knowledge or experts can help to identify these related features given that we want to have fair model on certain sensitive attributes. In addition, for different sensitive attribute such as race or gender, we can specify different set of related features. With these related features identified, we might be able to alleviate the fairness issue. One straightforward way is to discard related features for training a fair model. However, it will also discard important information for classification. Thus, though promising, it remains an open question of how to effective utilize related features to learn fair model with high classification accuracy.

Therefore, in this paper, we study a novel problem of exploring related features for learning fair and accurate classifiers without sensitive attributes. In essence, we are faced with two challenges: (i) how to utilize these related features to achieve fairness; and (ii) as the related features affect both accuracy and fairness, how to achieve an optimal trade-off between these two goals. In an attempt to solve these two challenges, we propose a novel framework Fairness with Related Features (FairRF). Instead of simply discarding related features, the basic idea of FairRF is to use the related features as both features for training the classifier and as pseudo sensitive attribute to regularize the model to give fair prediction, which help to learn fair and accurate classifiers. We theoretically show that regularizing the model using related features can achieve fairness on sensitive attribute. Furthermore, to balance the utilization of the related features for classification accuracy and model fairness, FairRF can automatically learn the importance weight of each related feature for regularization in the model, which also enables the model to handle noisy related features. The main contributions of the paper are as follows:

  • We study a novel problem of exploring biases encoded in related features to learn fair classifiers without sensitive attributes;

  • We theoretically show that by adopting related features as fairness regularizer on model’s prediction, we can learn fairer model;

  • We propose a novel framework FairRF which can simultaneous utilize the related features to learn fair classifiers and adjust the importance weights of each related feature; and

  • We conduct extensive experiments on real-world datasets to demonstrate the effectiveness of the proposed method for fair classifiers with high classification accuracy.

2. Related Work

To address the concerns of fairness in machine learning models, a number of fairness notions are proposed. These fairness notions can be generally split into three categories: (i) individual fairness (Dwork et al., 2012; Zemel et al., 2013; Kang et al., 2020; Lahoti et al., 2019), which requires the model to give similar prediction to similar individuals; (ii) group fairness (Dwork et al., 2012; Hardt et al., 2016; Zhang et al., 2017), which aims to treat the groups with different protected sensitive attributes equally; (iii) Max-Min fairness (Lahoti et al., 2020; Hashimoto et al., 2018; Zhang and Shah, 2014), which try to improve the per-group fairness. Our work focus on achieving group fairness without the sensitive attributes.

Extensive works have been conducted to incorporate the aforementioned group fairness notion in machine learning (Zhang et al., 2017; Beutel et al., 2017; Locatello et al., 2019; Dwork et al., 2012; Hardt et al., 2016; Zemel et al., 2013; Lahoti et al., 2020). Based on the stage of the training process that the fairness is applied on machine learning models, the algorithms can be generally split into three categories: the pre-processing approaches (Zhang et al., 2017; Kamiran and Calders, 2012; Xu et al., 2018), the in-processing approaches (Zafar et al., 2015; Zhang et al., 2018), and the post-processing approaches (Hardt et al., 2016; Pleiss et al., 2017). The pre-processing approaches modify the training data to reduce the historical discrimination in the dataset. For instance, the bias could be eliminated by correcting labels (Zhang et al., 2017; Kamiran and Calders, 2009), revising attributes of data (Kamiran and Calders, 2012; Feldman et al., 2015), generating non-discriminatory labeled data (Xu et al., 2018; Sattigeri et al., 2019), and obtaining fair data representations (Beutel et al., 2017; Locatello et al., 2019; Edwards and Storkey, 2015; Zemel et al., 2013; Louizos et al., 2015; Creager et al., 2019). The in-processing approaches revise the training of the state-of-the-art models to achieve fairness. More specifically, the in-processing approaches generally apply fairness constraints or design a objective function considering the fairness of predictions (Dwork et al., 2012; Zafar et al., 2015; Zhang et al., 2018). Finally, the post-processing approaches directly change the predictive labels of trained models to obtain fair predictions (Hardt et al., 2016; Pleiss et al., 2017).

Despite their ability in alleviating the bias issues, the aforementioned methods generally require the sensitive attributes of each data sample available to achieve fairness; while for many real-world applications, it is difficult to collect sensitive attributes of subjects due to various reasons such as privacy issues, and legal and regulatory restrictions. The lacking of sensitive attributes of training data challenges the aforementioned methods (Beutel et al., 2017). Though investigating fair models without sensitive attributes is important and challenging, it is still on its early stage. There are only a few work on this direction (Lahoti et al., 2020; Hashimoto et al., 2018; Yan et al., 2020). One branch of approaches (Lahoti et al., 2020; Hashimoto et al., 2018) countering this problem investigates fairness without demographics via solving a Max-Min problem. For instance, Lahoti et al. (Lahoti et al., 2020) proposes adversarial reweighted learning that leverages the notion of computationally-identifiable errors to achieve Rawlsian Max-Min fairness without sensitive attributes. However, these methods are only effective for achieving Max-Min fairness, and are ineffective for group fairness. The other branch (Dai and Wang, 2021; Yan et al., 2020) addresses this missing sensitive attribute scenario via providing pseudo group splits. For instance, Yan et al. (Yan et al., 2020) pre-processes the data via clustering and uses obtained groups as the proxy. However, the conformity between obtained groups from these approaches and real protected groups are highly dependent on the data distribution, which makes it difficult to justify generalization ability of them.

The proposed framework FairRF is inherently different from the aforementioned approaches: (i) We study a novel problem of exploring features that are highly related to the unseen sensitive ones for learning fair and accurate classifiers. Obtaining these features requires just a little prior domain knowledge, and it prevents the difficulty and instability of previous approaches in automatically detecting protected groups (Lahoti et al., 2020; Yan et al., 2020); and (ii) We theoretically show that by regularizing the model prediction with the related features that are highly corrected with sensitive features, we can learn a fair model with respect to the sensitive feature. In addition, our experimental results show that the given related feature set can be incomplete or noisy.

3. Problem Definition

Throughout this paper, matrices are written as boldface capital letters and vectors are denoted as boldface lowercase letters. For an arbitrary matrix

, denotes the -th entry of while and mean the -th row and -th column of , respectively. Capital letters in calligraphic math font such as are used to denote sets or cost function.

Let be the data matrix with each row as an -dimensional data instance. We use to denote the features and are the corresponding feature vectors, where is the -th column of . Let be the label vector, where the -th element of , i.e., , is the label of . Following existing work on fair machine learning models (Lahoti et al., 2020), we focus on binary classification problem, i.e., . Given and , we aim to train a fair classifier with good classification performance.

Extensive studies (Julia Angwin and Kirchner, 2016; Lahoti et al., 2020) have revealed that historical data may include patterns of previous discrimination and societal bias on sensitive attribute such as ages, genders, skin color, and regions. Even when sensitive attributes are not used as features, i.e., , a subset of none-sensitive features are highly correlated with sensitive attributes, making machine learning models trained on such data inherit the bias. For example, in dataset containing US criminal records (Julia Angwin and Kirchner, 2016), racial information is taken as sensitive. Although it is unseen, trained model could still be unfair as distribution of racial groups population may be leaked from the distribution of ages (Vogel and Porter, 2016).

In many real-world applications, sensitive attributes of data samples are unavailable due to various reasons such as difficulty in data collection, security or privacy issues. It challenges existing fair machine leaning approaches that require sensitive attributes of data samples for fair models. Though sensitive attribute of each data sample is unknown, since the bias is caused by the subset of features that are highly correlated with , can provide an alternative supervision for achieving fair modes. For example, one can simply remove from to train a more fair model. However, may also contain important information for classification. Directly removing could significantly degrade the classification performance. Therefore, we aim to explore the utilization of to help learn more fair model meanwhile maintain high classification performance. The problem is formally defined as:

Problem Definition Given the data matrix , with corresponding label , and a predefined feature subset , where each is highly correlated with the unobserved protected attribute , e.g., race or gender, learn a classifier that maintains high accuracy and is fair on .

Note that we assume is given. In practice, this can be obtained from domain knowledge or experts. In addition, we will experimentally show that doesn’t need to be complete and can contain noisy features as our model is able to weight each .

4. Preliminary Theoretical Analysis

Since is highly correlated with the sensitive attribute , our basic idea is to treat each related feature as a pseudo sensitive attribute and try to minimize the correlation of model’s prediction with , which helps to achieve fairness on . Next, we will provide theoretical proof to show that minimizing the correlation between and can help achieve fairness of the model on , which serves as the foundation of the proposed framework FairRF.

In this paper, we adopt Pearson correlation coefficient to measure the correlation between two variables, defined as below:

Definition 0 (Pearson Correlation Coefficient).

Pearson correlation coefficient measures the linear correlation between two random variables

and as:


where and

are the mean and standard deviation of

, respectively.

Next, we will show a theorem on the propagation property of Pearson correlation coefficient. This theorem justifies our motivation of regularizing the correlation between related features and model predictions in the case of absent sensitive attributes. Below, we first present a rule depicting the relation of three included angles in space, which is the basis of our proof.

Lemma 0 ().

Given a unit sphere centered at origin , and are three points on the surface of the sphere. Assume that the angle and the angle , then the cosine value of angle is within: .


From Spherical law of cosines (Gellert et al., 2012), we can know that:


where corresponds to the angle opposites in spherical triangle . As all angles are in the scale , we can directly induce:


which completes the proof. ∎

In the next step, we can show the relationship between Pearson correlation coefficient and cosine similarity of two variables.

Lemma 0 ().

Given two random variables , Pearson correlation coefficient between them can be calculated as the cosine distance between and , where

is an infinite-length vector constructed by sampling z-score value of

, i.e., and is the -th sample. Similarly, .


This can be easily proven by re-writing the form of Pearson correlation coefficient:


which completes the proof. ∎

With these preparations, we can now turn to our main theorem:

Theorem 4 ().

Given three random variables , with correlation coefficient and , , then is within .


The proof can be developed via following steps:

  1. Cosine similarity between and shows the cosine value of included angle between them. Hence, based on Lemma 3, we can learn that the cosine of angle between and is and that of angle between and is from the given correlation coefficients.

  2. , , can be taken as line , , in Lemma 2 respectively. Hence, utilizing Lemma 2, we could induce that the cosine value of angle between and should fall within the scale .

  3. Finally, based on Lemma3, we can map the cosine value of angle back into correlation coefficient between and .

After these steps, we can obtain that and finish the proof. ∎

From Theorem 4, we know that correlation coefficient has the propagation property, which motivates us to use related features as the substitute for real sensitive ones. Below, we show that how the constraint of correlation scale can be propagated from related features to the sensitive features in Theorem 5, which theoretically prove our idea.

Theorem 5 ().

Let and represents one input feature and sensitive attribute, respectively. Let denotes the variable of model’s prediction. Assume that is highly correlated with , i.e., is larger than a positive constant . If the model is trained to make near , i,e, within , then would be within , in which is close to .

Theorem 5 can be easily proved based on Theorem 4. It can be seen that in the ideal case, when approximates and approximates , would also approximates . In this way, the prediction would be insensitive towards , achieving fairness w.r.t sensitive attributes.

We can extend Theorem 5 to the case of utilizing multiple related features simultaneously. For a set of related features , assume we have their correlation coefficient with as , and with as . Then would fall upon the intersections of their resulting value space, which can be written as:


where is the smallest value in . Note that this range is usually not tight, and high divergence within would often restrict the range of more.

Figure 1. An illustration of the proposed framework FairRF. In Fairness Constraint block, controls the importance of regularization on -th feature of . is dynamically updated, reducing prior domain knowledge required.

5. Methodology

In this section, we present the details of the proposed framework FairRF to achieve the model’s fairness with sensitive attributes unknown. The basic idea of the proposed framework is using the regularization on correlated features as the surrogate fairness objective. With the motivation of using this surrogate theoretically justified in Sec 4, an illustration of the proposed framework is shown in Figure 1, which is composed of three parts: (i) a base classifier which takes each data sample for predict the label ; (ii) a covariance regualrizer which constrains correlation between selected related features and prediction results to achieve fairness; and (iii) a related feature importance learning module which learns importance score of each related feature to trade-off the prediction accuracy and fairness. Next, we introduce each component in detail.

5.1. Base Classifier

The proposed framework is flexible to use various base classifiers as backbone such as neural networks, logistic regression and SVM. Without loss of generality, we use

to denote the base classifier. Following existing work on fairness (Lahoti et al., 2020), we consider binary classification. We leave the extension to multi-class classification as future work. For a data sample

, the predicted probability of

having label is given as


Then the binary cross entropy loss for training the classifier can be written as


where is the label of .

Generally, the well trained model is good at classification. However, as shown in previous studies (Zhang et al., 2017; Beutel et al., 2017), the obtained model could make unfair predictions because spurious correlation may exist in the training data between sensitive attributes and labels due to societal bias. Though various efforts have been taken to mitigate the bias (Dwork et al., 2012; Hardt et al., 2016; Zafar et al., 2015), most of them require knowing the sensitive attributes. With the sensitive attributes unknown, to learn fair models, we propose to regularize the predictions using the related features that are highly correlated with , which will be introduced next.

5.2. Exploring Related Features for Fairness

If the sensitive attribute of each data sample is known, we can adopt to achieve fairness of the classification model by making the prediction independent of the sensitive attributes (Dwork et al., 2012; Zafar et al., 2015). Let be the sensitive attribute vector with the -th element of , i.e., , as the sensitive attribute of . Similarly, let be the predictions with the -th element being the prediction for . Following the design in  (Zafar et al., 2015; Dai and Wang, 2021), the pursuit of non-dependence between prediction and sensitive attribute can be achieved through minimizing the correlation score between them, which can be mathematically written as:


where and are the mean of and , respectively. Note that we set constraints directly on the correlation score instead of correlation coefficient, but it can be seen from Eq.1 that it only differs from correlation coefficient by a constant multiplier . Constraining the scale of this regularization term, and would be encouraged to have no statistical correlation with each other.

However, as sensitive attribute is unavailable in our problem, directly adopting the above regularization is impossible. Fortunately, from Theorem 5, we can see that if we have a set of non-sensitive features , with each feature , i.e., , has high correlation with , reducing the correlation between with can indirectly reduce the correlation between and , which help to achieve fairness, even though is unknown. Hence, in FairRF, we apply correlation regularization on each feature , in the purpose of making trained model fair towards . Without loss of generality, let the set of features in be , where . The regularization term is written as


where is the weight for regularizing correlation coefficient between and . is given as


where is the mean of .

Generally, if the correlation between and is large, we would prefer large to enforce to be close to , which can better reduce the correlation between and , resulting in a more fair classifier; while if the correlation between and is not that large, a small is preferred because under such case, making close to doesn’t help much in making and independent, but may introduce large noise in label prediction. Domain knowledge would be helpful in setting .

5.3. Learning the Importance of Related Features

One limitation of the proposed approach is the requirement of pre-defined . This information provides prior knowledge on the correlation between target sensitive attribute and other attributes, and is important for the success of the proposed proxy regularization. However, in some real-world applications, it is difficult to get accurate values. In addition, is also important in balancing the contribution of in model prediction and fairness. Larger . will result in the independence between and , i.e., will contribute little in model prediction. Hence, in this section, we propose to learn , allowing the model to automatically balance between classification accuracy and fairness.

Specifically, before learning, each related weight is initialized to a pre-defined value

, which serves as an inaccurate estimation of its importance. Then, during training, the value of

will be optimized along with model parameters iteratively. As no other information is available, we update by minimizing the total regularization loss, based on the intuition that an ideal surrogate correlation regularization should be achieved without causing significant performance drop. Range of is limited to the range , and the full optimization objective can be written as follows:


where sets the weights of regularization term, and is the set of parameters of the classifier.

Eq.(11) could lead to a trivial solution, i.e., to minimize the cost function, it tends to set corresponding to the smallest to and the others to . To alleviate this issue, we add to penalize being close to 1. Thus, the final objective function of FairRF is


where is used to control the contribution of .

6. Optimization Algorithm

The objective function in Eq.(12) is constrained optimization, which is difficult to be optimized directly. We take the alternating direction optimization (Goldstein et al., 2014) strategy to update and iteratively. The basic idea is to update one variable with the other one fixed at each step, which can ease the optimization process. Next, we give the details.

6.1. Updating Rules

UPDATE . To optimize , we fix and remove terms that are irrelevant to , which arrives at


This is a non-constrained cost function, and we can directly apply gradient descent to learn .

UPDATE . Then, given at current step, can be obtained through solving the following equation:


It is a convex primal problem, and strong duality holds as it follows Slater’s condition. For simplicity of notation, we use to represent . Then, we can solve this problem using Karush-Kuhn-Tucker(KKT) (Mangasarian, 1994) conditions as:


In the above equation, and are Lagrange multipliers. From the stationary condition, we can get:


Eliminating using complementary slackness, we have:


From this condition, we know that . Since , can be computed via solving the following equation:


Solving the above equation can be done as follows: we first rank in descending order as , i.e., . Assume that is within , then the above equation is reduced to


Then, we have


If , it is a valid solution; otherwise, it is invalid. We do this for every interval and find . With learned, we can calculate as:


6.2. Training Algorithm

With the above updating rules, the full pipeline of the training algorithm for FairRF is summarized in Algorithm 1. Before adding the regularization, we first pre-train the model to converge at a good start point in line in order to prevent correlation constraint from providing noisy signals. Then, from line to line , we fine-tune the model to be fair w.r.t related features. If not refining related weights, will stay fixed. Otherwise, it will be updated iteratively with parameter , as shown from line to .

1:  Randomly initialize ; Initialize all entries in as ;
2:  for batch in  do
3:     Update based on classification loss of current batch;
4:  end for
5:  while Not Converged do
6:     for step in MODEL_TRAIN_STEP do
7:        Update based on Equation 11;
8:     end for
9:     if Require learning weight then
10:        Obtain for each related feature ;
11:        Calculate and based on Eq.(18) to Eq.(21);
12:     end if
13:  end while
14:  return  Trained classifier .
Algorithm 1 Full Training Algorithm

7. Experiment

In this section, we conduct experiments to evaluate the effectiveness of the proposed FairRF in terms of both fairness and classification performance when sensitive attributes are unavailable. In particular, we aim to answer the following research questions:

  • RQ1 Can the proposed FairRF achieve fairness without sensitive attributes while maintain high accuracy?

  • RQ2 Is the proposed framework FairRF flexible to facilitate various classifiers?

  • RQ3 How would different selections of related features influence its performance?

We begin by presenting the datasets and implementation details, followed by baselines and experiments configuration. We then conduct experiments and ablation studies on real-world datasets to answer these three questions.

7.1. Datasets

Train Size
Validate Size
Test Size
Table 1. Statistics of datasets used in this work.

We conduct experiments on three publicly available benchmark datasets, including Adult (Asuncion and Newman, 2007), COMPAS (Julia Angwin and Kirchner, 2016) and LSAC (Wightman, 1998).

  • ADULT111 This dataset contains records of personal yearly income, with binary label indicating if the salary is over or under per year. Twelve features such as age, education and occupation are provided, and the gender of each subject is considered as sensitive attribute. Furthermore, age, relation and marital status are used as .

  • COMPAS222 This dataset predicts the risk of recidivism, assessing the possibility of a criminal to re-offend within a certain future. It contains criminal records collected in United States. Each data sample has features such as previous years in prison, gender and age. The race of each defendant is the sensitive attribute. In constructing

    , score, decile text and sex are selected.

  • LSAC333 This is the Law School Admissions Council (LSAC) Dataset, which contains admissions data from law schools in the United States over the 2005, 2006, and in some cases 2007 admission cycles. Its binary labels indicate whether each candidate successfully pass the bar exam or not, and their gender information is considered as sensitive. For this dataset, we use race, year and residence as .

Statistics of these three datasets are summarized in Table 1. Note that for all three datasets, features in are selected following existing analysis or prior domain knowledge. For example, in COMPAS, biases towards race have been found to exist in score and decile text (Julia Angwin and Kirchner, 2016). The correlation between race and gender is also found from reports by U.S. Bureau of Justice Statistics(BJS). Since race is the sensitive attribute of the dataset, we include score, decile text and gender in .

7.2. Experimental Settings

7.2.1. Baselines

To evaluate the effectiveness of FairRF, we first compare with the vanilla model and sensitive-attribute-aware model, which can be seen as the lower and upper bound of our model’s performance.

  • Vanilla model: It directly uses the base classifier without any regularization terms. It is used to show the performance without fairness-assuring algorithm taken.

  • ConstrainS: In this baseline, we assume that the sensitive attribute of each data sample is known. We add the correlation regularization between sensitive attribute vector and model output , i.e., . It is used to show the accuracy and fairness we can achieve when sensitive attributes are known, which set a reference point for the performance of the proposed framework. Note that for all the other baselines and our model, is unknown.

Existing works on fair classifiers without sensitive attributes are rather limited (Lahoti et al., 2020; Yan et al., 2020). We include the representative and state-of-the-art baselines for achieving fairness whiteout sensitive attributes:

  • KSMOTE (Yan et al., 2020): As ground-truth protected groups are not available, KMeans SMOTE (KSMOTE) performs clustering to obtain pseudo groups, and use them as substitute. The model is regularized to be fair with respect to those generated groups.

  • RemoveR: This method directly removes all candidate related features, i.e., . We design this baseline in order to validate the benefits of our proposed method in regularizing related features.

  • ARL (Lahoti et al., 2020) Unlike traditional group fairness approaches, ARL follows Rawlsian principle of Max-Min welfare for distributive justice. It optimizes model’s performance through re-weighting under-represented regions detected by an adversarial model.

Note that the fairness formulation of ARL is different from the group fairness we focus on. It focuses more on improving the AUC of worst-case; while traditional group fairness focus more on the equality among groups. Thus, ARL (Lahoti et al., 2020) is inefficient in obtaining demographic fairness by design, which is also verified by our experiments. Although not working on the same fairness definition, we still include it as one baseline for completeness of the experiment.

7.2.2. Configurations

For KSMOTE, we directly use the code provided by  (Yan et al., 2020)

, and report the result. For all other approaches, we implement a multi-layer perceptron (MLP) network with three layers as the backbone classifier. The two hidden dimensions are

and . Adam optimizer is adopted to train the model, with initial learning rate as . Across all datasets, percent of instances are used in training, percent used in validation, and percent used in testing. All experiments are conducted on a -bit machine with Nvidia GPU (Tesla V100, 1246MHz, 16 GB memory).

7.2.3. Evaluation Metrics

To measure the fairness, following existing work on fair models (Verma and Rubin, 2018; Yan et al., 2020)

, we adopt two widely used evaluation metrics, i.e., equal opportunity and demographic parity, which are defined as follows:

Equal Opportunity Equal opportunity requires that the probability of positive instances with arbitrary protected attributes being assigned to a positive outcome are equal:


where is the output of model , representing the probability of being predicted as positive. In experiments, we report difference in equal opportunity():


Demographic Parity Demographic parity requires the behavior of prediction model to be fair on different sensitive groups. Concretely, it requires that the positive rate across sensitive attributes are equal:


Similarly, in the experiment, we report the difference in demographic parity():


Equal opportunity and demographic parity measure the fairness from different perspectives. Equal opportunity requires similar performance across protected groups, while demographic parity is more focused on fair demographics. The smaller and are, the more fair a model is. Furthermore, to measure the classification performance, we use accuracy (ACC) as the evaluation metric, as it is not an imbalanced learning setting.

7.3. Fair Classification Performance Comparison

To answer RQ1

, we fix the base classifier as MLP and conduct classification on all three datasets. For all the baselines, the hyperparameters are tuned via grid search on the validation dataset. In particular, for FairRF,

is set to on ADULT, on COMPAS, and on LSAC. is set as for all three datasets. More details on the hyperparameters sensitivity will be discussed in Sec 7.6. Each experiment is conducted times and the average performance in terms of accuracy, and with standard deviation are reported in Table 4, Table 4 and Table 4. From the tables, we make the following observations:

  • Constraining related features can help the model to perform fairer on sensitive groups. For example, compared with vanilla approach in which no fair-learning techniques are applied, FairRF shows a clear improvement w.r.t Equal Opportunity and Demographic Parity across all three datasets.

  • FairRF improves the fairness without causing significant performance drop, and works stably. No pre-computed clusters are required, and it does not involve training an adversarial model, hence FairRF can get results with less deviation compared to ARL and KSMOTE.

  • Comparing with previous approaches in this unseen sensitive attribute scenario, FairRF is effective for both two fairness metrics. For example, ARL is able to improve on “equal opportunity”, but the performance would drop w.r.t “demographic parity”. We attribute this to the capacity of FairRF in extracting knowledge from related features.

Methods ACC
Table 3. Comparison of different approaches on COMPAS
Methods ACC
Table 4. Comparison of different approaches on LSAC.
Methods ACC
Table 2. Comparison of different approaches on ADULT.

7.4. Flexibility of FairRF for Various Base Classifiers

In the above experiment, we fix the base classifier as MLP. In this subsection, we conduct experiments to verify if FairRF can benefit various machine learning models to achieve fairness while maintain high accuracy when the sensitive attributes are unknown, which aims to RQ2

. Specifically, in addition to MLP, we also adopt two other widely-used classifiers as the base classifiers of FairRF, i.e., Linear Regression (LR) and Support Vector Machine (SVM). We implement both of them in a gradient-based manner, so that regularization on related features can be optimized easily with back propagation. We tune the hyperparameters on the validation set.

is fixed to , and is set to and for LR and SVM, respectively. Each experiment is also conducted for times, and average results on ADULT dataset are reported in Table 5. From the table, we observe that

  • Comparing with the three base classifiers, integrating FairRF makes the accuracy drops a little bit, which is consistent with observations in other work on fair models (Yan et al., 2020) as adding the fairness regualrizer will drop the performance. However, the accuracy drop is not significantly. For example, for LR, the accuracy only drops by 2%, which shows that we are still able to maintain high accuracy;

  • Though the accuracy drop a little bit, the fairness in terms of and on three models improves significantly, even though the sensitive attributes are not observed. For instance, for LR, with the FairRF framework, drops by 58.5% while the accruacy only drops by 2%. In other words, we scarify a little bit of accuracy while significantly improves the fairness.

These observations show that FairRF can benefit various machine learning models to achieve fairness while maintain high accuracy when the sensitive attributes are unknown

Method ACC
Table 5. Evaluate effectiveness of FairRF on different base classifiers on ADULT.
(a) ACC
Figure 2. Parameter Sensitivity on ADULT

7.5. Impact of the Quality of on FairRF

In this section, we conduct experiment to investigate the impact of the quality of on the performance of FairRF to answer RQ3. In particular, we consider the following variants of FairRF:

  • Top-1: It uses only the most-effective related features. We test all candidates and select the one that achieves highest performance when used as related feature, and report its performance.

  • Fix-: The same is adopted for all related features, and its value is not automatically updated during training.

  • ConstrainAll: All features are taken as related features. The performance of FairRF in the extreme case that no prior knowledge is available can be observed in this setting. All other settings are the same as FairRF.

For all these baselines, hyper-parameters are found via grid search, and experiments are conducted for times randomly. The average results on ADULT are reported in Table 6. From the table, we can draw following observations:

  • FairRF benefits from dynamically update . Compare with Fix-, FairRF shows a much stronger fairness in terms of equal opportunity, and achieves better accuracy at the same time;

  • FairRF shows a moderate improvement compared with Top-1. However, Top-1 requires careful selection of the most effective related feature, while FairRF can achieve better performance with less prior domain knowledge;

  • In the extreme case that no prior knowledge is available, FairRF can still achieve a little improvement on fairness metrics compared with vanilla model. It again shows that FairRF can cope with little domain knowledge scenario. Note that ADULT has only

    features in total. In applications with hundreds of features, selection of related features may still be very important.

Methods ACC
Table 6. Comparison of different strategies in selecting related features on ADULT.

7.6. Parameter Sensitivity Analysis

In this subsection, we analyze the sensitivity of FairRF on hyperparameters and , to provide insights in setting them. controls the importance of coefficient regularization term, and can adjust the distribution of learned . We vary as and as . Other settings are the same as FairRF. For model backbone, we adopt MLP as in Sec 7.3. This experiment is performed on ADULT dataset, and the result is shown in Figure 2. From the figure, we make the following observations.:

  • Larger will achieve fairer predictions, but may also cause severe drop in accuracy when it is larger than some thresholds;

  • Generally, smaller requires larger to achieve fairness. Small allows learned to be sparse. As a result, a large portion of coefficient regularization term could be enforced on less-discriminative attributes that are less-related at the same time;

  • Larger encourages learned to be uniform, resulting a faster drop in accuracy when goes large.

These observations could help to find suitable hyper-parameter choices in other applications.

8. Conclusion

In this paper, we study a novel and challenging problem of exploring related features for learning fair and accurate classifiers without knowing the sensitive attribute of each data sample. We propose a new framework FairRF which utilize the related features as pseudo sensitive attribute to regularize the model prediction. Our theoretical analysis show that if the related features are highly correlated with the sensitive attribute, by minimizing the correlation between the related features and model’s prediction, we can learn a fair classifier with respect to the sensitive attribute. Since we lack the prior knowledge of the importance of each related feature, we design a mechanism for the model to automatically learn the importance weight of each feature to trade-off their contribution on classification accuracy and fairness. Experiments on real-world datasets show that the proposed approach is able to achieve more fair performance compared to existing approaches while maintain high classification accuracy.


  • A. Asuncion and D. Newman (2007) UCI machine learning repository. Irvine, CA, USA. Cited by: §7.1.
  • M. Bakator and D. Radosav (2018) Deep learning and medical diagnosis: a review of literature. Multimodal Technologies and Interaction 2 (3), pp. 47. Cited by: §1.
  • A. Beutel, J. Chen, Z. Zhao, and E. H. Chi (2017) Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075. Cited by: §1, §2, §2, §5.1.
  • D. D. Celentano, M. S. Linet, and W. F. Stewart (1990) Gender differences in the experience of headache. Social science & medicine 30 (12), pp. 1289–1295. Cited by: §1.
  • A. Coston, K. N. Ramamurthy, D. Wei, K. R. Varshney, S. Speakman, Z. Mustahsan, and S. Chakraborty (2019)

    Fair transfer learning with missing protected attributes

    In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 91–98. Cited by: §1, §1, §1.
  • E. Creager, D. Madras, J. Jacobsen, M. A. Weis, K. Swersky, T. Pitassi, and R. Zemel (2019) Flexibly fair representation learning by disentanglement. arXiv preprint arXiv:1906.02589. Cited by: §2.
  • E. Dai and S. Wang (2021) Say no to the discrimination: learning fair graph neural networks with limited sensitive attribute information. WSDM. Cited by: §2, §5.2.
  • X. Dastile, T. Celik, and M. Potsane (2020) Statistical and machine learning models in credit scoring: a systematic literature survey. Applied Soft Computing 91, pp. 106263. Cited by: §1.
  • C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel (2012) Fairness through awareness. In ITCS, pp. 214–226. Cited by: §1, §1, §2, §2, §5.1, §5.2.
  • H. Edwards and A. Storkey (2015) Censoring representations with an adversary. arXiv preprint arXiv:1511.05897. Cited by: §2.
  • M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) Certifying and removing disparate impact. In SIGKDD, pp. 259–268. Cited by: §1, §2.
  • W. Gellert, M. Hellwich, H. Kästner, and H. Küstner (2012) The vnr concise encyclopedia of mathematics. Springer Science & Business Media. Cited by: §4.
  • M. A. Gianfrancesco, S. Tamang, J. Yazdany, and G. Schmajuk (2018) Potential biases in machine learning algorithms using electronic health record data. JAMA internal medicine 178 (11), pp. 1544–1547. Cited by: §1.
  • T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk (2014) Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7 (3), pp. 1588–1623. Cited by: §6.
  • M. Hardt, E. Price, and N. Srebro (2016)

    Equality of opportunity in supervised learning

    In NeurIPS, pp. 3315–3323. Cited by: §1, §1, §2, §2, §5.1.
  • T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang (2018) Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, pp. 1929–1938. Cited by: §2, §2.
  • S. M. Julia Angwin and L. Kirchner (2016) Machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica. Cited by: §1, §1, §3, §7.1, §7.1.
  • F. Kamiran and T. Calders (2009) Classifying without discriminating. In ICCC, pp. 1–6. Cited by: §1, §2.
  • F. Kamiran and T. Calders (2012) Data preprocessing techniques for classification without discrimination. KAIS 33 (1), pp. 1–33. Cited by: §2.
  • J. Kang, J. He, R. Maciejewski, and H. Tong (2020) InFoRM: individual fairness on graph mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 379–389. Cited by: §2.
  • P. Lahoti, A. Beutel, J. Chen, K. Lee, F. Prost, N. Thain, X. Wang, and E. H. Chi (2020) Fairness without demographics through adversarially reweighted learning. arXiv preprint arXiv:2006.13114. Cited by: §1, §1, §2, §2, §2, §2, §3, §3, §5.1, 3rd item, §7.2.1, §7.2.1.
  • P. Lahoti, K. P. Gummadi, and G. Weikum (2019) Operationalizing individual fairness with pairwise fair representations. arXiv preprint arXiv:1907.01439. Cited by: §2.
  • F. Locatello, G. Abbati, T. Rainforth, S. Bauer, B. Schölkopf, and O. Bachem (2019) On the fairness of disentangled representations. In NeurIPS, pp. 14584–14597. Cited by: §2.
  • C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel (2015)

    The variational fair autoencoder

    arXiv preprint arXiv:1511.00830. Cited by: §2.
  • O. L. Mangasarian (1994) Nonlinear programming. SIAM. Cited by: §6.1.
  • N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan (2019) A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635. Cited by: §1.
  • G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger (2017) On fairness and calibration. In NeurIPS, pp. 5680–5689. Cited by: §1, §2.
  • P. Sattigeri, S. C. Hoffman, V. Chenthamarakshan, and K. R. Varshney (2019)

    Fairness gan: generating datasets with fairness properties using a generative adversarial network

    IBM Journal of Research and Development 63 (4/5), pp. 3–1. Cited by: §1, §2.
  • N. A. Saxena, K. Huang, E. DeFilippis, G. Radanovic, D. C. Parkes, and Y. Liu (2019) How do fairness definitions fare? examining public attitudes towards algorithmic definitions of fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 99–106. Cited by: §1.
  • S. Verma and J. Rubin (2018) Fairness definitions explained. In 2018 ieee/acm international workshop on software fairness (fairware), pp. 1–7. Cited by: §1, §7.2.3.
  • M. Vogel and L. C. Porter (2016) Toward a demographic understanding of incarceration disparities: race, ethnicity, and age structure. Journal of quantitative criminology 32 (4), pp. 515–530. Cited by: §1, §3.
  • L. F. Wightman (1998) LSAC national longitudinal bar passage study. lsac research report series.. Cited by: §7.1.
  • D. Xu, S. Yuan, L. Zhang, and X. Wu (2018) Fairgan: fairness-aware generative adversarial networks. In Big Data, pp. 570–575. Cited by: §2.
  • S. Yan, H. Kao, and E. Ferrara (2020) Fair class balancing: enhancing model fairness without observing sensitive attributes. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1715–1724. Cited by: §1, §2, §2, 1st item, 1st item, §7.2.1, §7.2.2, §7.2.3.
  • A. Yapo and J. Weiss (2018) Ethical implications of bias in machine learning. In Proceedings of the 51st Hawaii International Conference on System Sciences, Cited by: §1.
  • M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi (2015) Fairness constraints: mechanisms for fair classification. arXiv preprint arXiv:1507.05259. Cited by: §1, §2, §5.1, §5.2.
  • R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In ICML, pp. 325–333. Cited by: §2, §2.
  • B. H. Zhang, B. Lemoine, and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In AIES, pp. 335–340. Cited by: §2.
  • C. Zhang and J. A. Shah (2014) Fairness in multi-agent sequential decision-making. Cited by: §2.
  • L. Zhang, Y. Wu, and X. Wu (2017) Achieving non-discrimination in data release. In SIGKDD, pp. 1335–1344. Cited by: §1, §2, §2, §5.1.