Fairness-aware Summarization for Justified Decision-Making

07/13/2021 ∙ by Moniba Keymanesh, et al. ∙ The Ohio State University 0

In many applications such as recidivism prediction, facility inspection, and benefit assignment, it's important for individuals to know the decision-relevant information for the model's prediction. In addition, the model's predictions should be fairly justified. Essentially, decision-relevant features should provide sufficient information for the predicted outcome and should be independent of the membership of individuals in protected groups such as race and gender. In this work, we focus on the problem of (un)fairness in the justification of the text-based neural models. We tie the explanatory power of the model to fairness in the outcome and propose a fairness-aware summarization mechanism to detect and counteract the bias in such models. Given a potentially biased natural language explanation for a decision, we use a multi-task neural model and an attribution mechanism based on integrated gradients to extract the high-utility and discrimination-free justifications in the form of a summary. The extracted summary is then used for training a model to make decisions for individuals. Results on several real-world datasets suggests that our method: (i) assists users to understand what information is used for the model's decision and (ii) enhances the fairness in outcomes while significantly reducing the demographic leakage.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Related Work

AI systems are increasingly adopted to assist or replace humans in several highly consequential domains including recidivism assessment Barry-jester et al. (2015), policing Rudin (2013); Keymanesh et al. (2020b), credit card offering Steel and Angwin,Julia (2010), lending Koren (2016), and prioritizing resources for inspection 111https://chicago.github.io/food-inspections-evaluation/. To maximize the utility, such models are trained to minimize the error on historical data which may be biased toward certain individuals or groups. Bias in the data can be due to historical bias, representation bias, or measurement bias Mehrabi et al. (2019); Suresh and Guttag (2019); Olteanu et al. (2019). Training models on biased data without fairness considerations have resulted in several cases of discrimination against certain groups of individuals Kleinberg et al. (2016); Osoba and Welser IV (2017); Angwin et al. (2016); Bolukbasi et al. (2016). Discrimination in this context is defined as the unjustified distinction between individuals based on their membership in a protected group 222Protected groups that are currently recognized by law include race, gender, and color Act (1964). Find a full list of protected classes in https://en.wikipedia.org/wiki/Protected_groupTushev et al. (2020). The concerns and observations regarding the unfairness of AI algorithms have led to a growing interest in defining, measuring, and mitigating algorithmic unfairness Pessach and Shmueli (2020); Berk et al. (2018); Chouldechova and Roth (2018); Friedler et al. (2019); Holstein et al. (2019).

In recent years, several techniques have been proposed to enhance fairness in machine learning algorithms. Such methods can be broadly categorized into pre-processing methods, in-processing methods, and post-processing methods 

Pessach and Shmueli (2020). Pre-processing mechanisms use re-weighting, relabeling or other transformations of the input data to remove dependencies between the class label and the sensitive attributes before feeding it to the machine learning algorithm Ghassami et al. (2018); Calmon et al. (2017); Zemel et al. (2013); Feldman et al. (2015); Del Barrio et al. (2018); Edwards and Storkey (2015); Xu et al. (2018). This approach is closely related to the field of privacy Ebrahimi et al. (2020). Since both fairness and privacy can be enhanced by obfuscating sensitive information from the input data with the adversary goal of minimal data perturbation Kazemi et al. (2018); Jaiswal and Provost (2020)

. Our proposed approach of using fairness-aware text summarization to remove bias from the input explanations also belongs to this category. In-processing methods modify the optimization procedure of the classifier to integrate fairness criteria in the objective function 

Kamishima et al. (2012); Aghaei et al. (2019); Calders and Verwer (2010). This is often done by using a regularization term Donini et al. (2018); Zafar et al. (2017a, b, c); Goel et al. (2018); Bechavod and Ligett (2017); Berk et al. (2017); Rahmattalabi et al. (2020); Kamiran et al. (2010), meta-learning algorithms Celis et al. (2019), reduction-based methods Agarwal et al. (2018); Cotter et al. (2019), or adversarial training Madras et al. (2018); Zhang et al. (2018); Celis and Keswani (2019); Wadsworth et al. (2018). Post-processing methods adjust the output of the AI algorithm to enhance fairness in decisions Fish et al. (2016). For example, by flipping some of the decisions of the classifier Hardt et al. (2016) or learning a different classifier Dwork et al. (2018) or a separate threshold for each group Menon and Williamson (2018).

A majority of these methods mitigate bias in decision-making by minimizing the difference between treatment and outcome among different protected groups. However, in some cases, the differences can be justified and explained using some features and therefore is not considered illegal Mehrabi et al. (2019). For example, Kamiral et al  Kamiran and Žliobaitė (2013) state that the difference in income level in females and males in the UCI adult income dataset 333https://archive.ics.uci.edu/ml/datasets/adult- a well-studied dataset in algorithmic fairness research- can be attributed to the difference in working hours. They argue that methods that do not take into account the explainability aspect of discrimination will result in reverse discrimination which is equally harmful. This brings us to the focus of our work - fairly-justified decision-making. A fairly-justified decision should not be based on information about membership in protected groups. In addition, the justification should include enough information to explain the outcome. In other words, given the justifications, human subjects should be able to understand why a decision has been made for them and interpret what were the main factors that led to the received outcome Carvalho et al. (2019).

In this work, we propose a fairness-aware summarization mechanism as a pre-processing step to reduce potential biases from textual justifications. Automatic summarization Sarkhel et al. (2020) has been previously used to extract text based on human-interpretable syntactic or semantic properties. We propose methods to first identify and measure bias in textual explanations and then mitigate this bias using a filtering-based approach. We measure bias 1) by using metrics such as demographic parity Calders et al. (2009)

, equalized odds 

Hardt et al. (2016), and calibration Kleinberg et al. (2016) and 2) by measuring the adversary’s ability to identify membership in protected groups given the textual explanations. To counteract the bias, our proposed summarization model extracts explanations that maximize utility for the final decision while removing unfair justifications that correlate with membership in protected groups. Next, the extracted fairly-justified summaries will be used to train a final model. This framework facilitates learning fairly justified models by removing biases from input explanations. Additionally, it assists users with understanding why a decision has been made for them by presenting the most predictive justifications. To summarize, in this study, we make the following contributions: C1. We propose the use of a multi-task model and an attribution mechanism to attribute the decision of the model as well as potential biases in the justification for the decision to certain parts of the inputs. C2. We propose a fairness-aware summarization model to condense the input explanations by extracting the most predictive justification while removing the unfair ones. C3. We show that this pre-processing step does not hurt the utility of the model but significantly limits the leakage of information about protected attributes of individuals in the input justifications. C4. We show that pre-processing the input justifications to make them fair using our proposed approach also moderately enhances the fairness in the outcome. Next, we will formally define our problem and explain our proposed solution.

2 Problem Formulation

Given a dataset consisting of samples where denotes a textual explanation written by the decision-maker to provide evidence or justify an outcome and indicates one or more protected variables such as gender or race, we aim to extract a fairly-justified summary such that i) provides sufficient information to predict and justify and ii) is independent of protected variable . We explain how we measure and attribute these qualities to sentences in the justification in Section 3. For instance, could represent a court decision for individual , which is a member of the demographic group and has received a textual argument regarding this decision. Potentially, can be biased toward certain demographic groups. Our goal is to transform a given dataset into a new dataset that is decontaminated from unfair arguments. To achieve this goal, we use a fairness-aware extractive summarization model as a data pre-processing step.

Figure 1: A graphical model of the proposed approach. represents the protected attribute. indicates the input explanations while indicates the farily-justified summary of which is used to train the final model to predict outcome .

3 Proposed Methodology

In this section, we explain our proposed methodology to extract a fairly-justified summary such that i) summary provides sufficient information to predict and justify and ii) the extracted summary is independent of protected variable . A graphical model of the proposed approach is shown in Figure 1. Given an input explanation consisting of sentences , the goal of our proposed fairly-justified extractive summarization model is to select a subset of these sentences subject to a utility and a fairness constraint. Next, we explain how we measure and attribute utility and discrimination of the input sentences.

Utility Control: To ensure that the extracted summary includes sufficient decision-relevant information in , we measure the salience of each sentence in in predicting outcome . We train a neural classification model on using ground truth decision as supervision. Next, we use this model to derive the contribution of each sentence in for predicting outcome . This process is explained in Section 3.2. We hypothesize that the dataset is sufficiently large and the model can learn which factors are associated with which outcomes. This assumption especially holds for scenarios in which a decision-maker (e.g. an inspector or judge) is required to go through a standard set of criteria (e.g. a standard form or set of guidelines) and thus, the same arguments may repeatedly be articulated in different ways to justify a certain outcome. Sentences with the highest attribution score can be incorporated as candidates to be included in the summary. However, some of these high-utility arguments may be unfairly used to explain decisions for certain protected groups. We explain how we mitigate this problem in Section 3.

Discrimination Control: To ensure that sentences in input explanation that are biased toward certain protected groups are excluded from summary , we attribute a discrimination score to each sentence in . Discrimination is defined as the utility of an argument in identifying the membership of an individual in the protected group . We use justification to predict protected attribute . Next, we use the trained model to derive the contribution of each sentence in the membership identification task. Sentences with the highest discrimination score should not be selected for inclusion in summary . We train a multi-task model for decision classification and membership identification tasks. Next, we explain our model architecture.

3.1 Model Architecture

Prior research has exploited word embeddings and Convolutional Neural Networks (CNN) for sentence classification task 

Collobert et al. (2011); Kalchbrenner et al. (2014); Heidari and Rafatirad (2020b); Jafariakinabad and Hua (2019); Zhang and Wallace (2015); Heidari and Rafatirad (2020a). Kim Kim (2014)

achieved strong empirical performance using static vectors and little hyper-parameter tuning over a range of benchmarks. Variations of this architecture have achieved good performance for extractive summarization of privacy policies 

Keymanesh et al. (2020a) and court cases Zhong et al. (2019). CNNs are fast to train and can easily be combined with methods such as Integrated Gradients Sundararajan et al. (2017) for attributing predictions to sentences in the explanations. These considerations led to our decision to use a slight variant of the sentence-ngram CNN model in Zhong et al. (2019) for decision outcome prediction and membership identification tasks. Next, we will explain the architecture of this model.

Given explanation consisting of sentences/arguments to justify decision for individual , we use Universal Sentence Encoder Cer et al. (2018) to encode each sentence to a 512-dimensional embedding vector . We build the justification matrix by concatenating the sentence vectors to :

The Sentence Encoder is pre-trained using a variety of data sources and tasks Cer et al. (2018) using the Transformer Vaswani et al. (2017)

architecture and is obtained from Tensorflow Hub 

444https://tfhub.dev/google/universal-sentence-encoder/4. Following Collobert et al. (2011) we apply convolution filters to windows of sentences in explanation

to capture compounded and higher-order features. We use multiple filter sizes to capture various features from sentence n-grams. We use filter sizes of

where is the height or region size of the filter and indicates the number of sentences that are considered jointly when applying the convolution filter. is the dimensionality of the sentence vectors and is equal to 512. The feature map of the convolution operation is then obtained by repeatedly applying the convolution filter to a window of sentences . Each element in feature map is then obtained from:

where is the sub-matrix of from row to corresponding to a window of sentence to and "" represents the dot product between the filter and the sub-matrices. represents the bias term and

is an activation function such as a rectified linear unit. We use window sizes 2, 3, and 4 and train 100 filters for each window size. The dimensionality of the feature map

generated by each convolution filter is different for explanations with various lengths and filters with different heights. We apply an average-max pooling operation over the feature maps of each window size to downsample them by taking the average value over the window defined by a pool size. Next, we concatenate the output vectors. Eventually, the concatenated vector runs through a dense layer with 64 units followed by an activation function 

555For classification tasks we used softmax (multi-class) or Sigmoid (binary classes) functions. For scalar outputs, we used Rectified Linear Unit..

This is a multi-task model with a decision learner and membership identifier modules. The decision learner is trained using decision outcome as supervision and the membership identifier is trained using the protected attribute . For instance, could represent an inspection outcome (e.g. fail, pass, or conditional pass) for establishment which is owned by an individual who is a member of a demographic group

 (e.g. a racial group). In our setup, the loss at each epoch is computed based on a weighted sum of the decision prediction and membership identification losses. Training details are explained in Section 

4.2. Next, we explain the method we use for attributing the predictions and of the model to arguments proposed in justification .

3.2 Attribution

Sundararajan et al Sundararajan et al. (2017)

proposed a method called Integrated Gradients to attribute predictions of a deep neural network to its input features. This method is independent of the specific neural architecture and can provide a measure of relevance for each feature by quantifying its impact on the outcome of the prediction. Zhong et al 

Zhong et al. (2019) adopted this method for identifying most decision-relevant aspects of legal cases. We also utilize this method to measure the impact of each input sentence in decision prediction and the membership identification tasks. Essentially we take a straight line path from an input to its baseline  666Conceptually, baselines represent data points that do not contain any useful information for the model. They are used as a benchmark by the integrated gradients method. Sundararajan et al Sundararajan et al. (2017) suggest using an all-zero input embedding vector for text-based networks. and notice how model prediction changes along this path by integrating the gradients along the path. To approximate the integral of the integrated gradients, we simply sum up the gradients at points occurring at small intervals along the straight-line path from the baseline to the input. The resulting single scalar represents the gradients and attributes the prediction to input features. The integrated gradient along the -th dimension for an input and baseline is defined as follows:

Here, represents the neural model, is the gradient of F(X) along the -th dimension, represents the input at hand, represents the baseline input( an all-zero vector), and is the number of steps in the approximation of the integral. To obtain utility attributions for sentences in input justification we calculate the integrated gradients attributions for the model using the predicted decision outcome . Note that each input feature is one dimension of sentence embedding. To obtain salience scores for each sentence, we sum up the attribution scores for each dimension. Next, we run through a softmax function to get a utility distribution over the sentences. Similarly, we obtain discrimination attributions for sentences by calculating the integrated gradients attributions for the model using the predicted protected attribute . We run through a softmax function to get a discrimination distribution over the sentences.

Figure 2: An overview of the architecture: Decision learner and membership identifier are trained using decision and protected attribute as supervision respectively.The attributions of each module is normalized and subtracted to obtain the inclusion scores.

We include high-utility and discrimination-free sentences in the fairly-justified summary of the explanations. To do so, we subtract the discrimination attribution from the utility attributions to get the final inclusion score for each sentence . Essentially, the inclusion score is computed using the following equation:

In the equation above, is a hyper-parameter that controls the utility-discrimination trade-off. Higher values of correspond to removing more information about protected attributes from the input justifications. Figure 2 shows the attribution process. If an argument is used unfairly to justify an outcome for individuals in a certain protected group, it will get a high utility attribution and high discrimination attribution . The subtraction operation ensures that it gets a small inclusion score .

Extracting Fairly-Justified Summarizes: Given sentences and the corresponding inclusion scores , we select sentences with positive scores to be included in the fairly-justified summary. As explained in Section 3.2 sentences with a positive score have high utility for decision prediction but do not reveal the protected attribute of the individuals. In our experiments, we test whether training a decision classifier on the fairly-justified summaries will enhance fairness in the outcome on real-world and synthetic datasets.

4 Experiments and Results

In this section, we introduce the datasets we use for training and testing our model followed by training details, experimental setup, and metrics in consideration.

4.1 Datasets

Inspection Reports of food establishments in Chicago (D1): The City of Chicago has published reports of food inspections conducted since 2010. We extracted the information on food inspections conducted from January 2010 till December 2014 from the City of Chicago’s GitHub repository 777https://github.com/Chicago/food-inspections-evaluation. This dataset contains the outcome of inspection which can be pass, fail, or conditional pass as well as notes that the sanitarian left in the inspection form about the observed violations in order to justify the outcome and explain what needs to be fixed before the next inspection 888There could be other outcomes e.g. when the sanitarian could not access the establishment. These cases are excluded from our study. We also excluded the inspection reports with comments section shorter than 12 sentences.. This dataset does not include the demographic information of the food establishment owners. As a proxy, we use the ethnicity of the majority of the population in the census block group that the food establishment is located at 999The demographic information of neighborhoods were extracted from https://www.census.gov/

. The resulting dataset includes 17,212 inspection reports. The inspector comments are on average 18.2 sentences long with a standard deviation of 7.2. The breakdown of the inspection outcome for each demographic group is shown in Table 

4 of the supplementary material A.1. We train the decision classifier explained in Section 3.1 on inspector notes using inspection outcome and ethnicity as supervision for decision classifier and membership identifier respectively.

Rate My Professor (D2-D4): Students can leave an anonymous review and rating on the scale of 1-5 in several categories for their previous instructors on the Rate My Professor (RMP) website 101010https://www.ratemyprofessors.com/. Several previous studies have identified several types of biases in students’ evaluations Legg and Wilson (2012); Reid (2010); Clayson (2014); Bleske-Rechek and Michels (2010); Rosen (2018); Theyson (2015). In our study, we aim to detect and remove potential biases in justifications provided by students to explain their ratings using our proposed methodology. We rely on the dataset collected in He (2020). We combine all the reviews written for each instructor and use the average rating as the supervision for the decision classifier. We use the gender of the instructor as the supervision for the membership identifier model. In our experiments, we exclude the instructors that have less than 5 reviews. We also remove the pronouns and instructors’ names from the reviews111111This pre-processing step ensures that the membership identifier does not rely on blatant signals from the text and instead extracts more latent patterns in the justifications. . The resulting dataset includes reviews written for 1344 instructors which are on average 45.6 sentences long. We indicate this dataset with D2. We create two additional datasets D3 and D4 by splitting the RMP dataset based on the gender gap of the students in each discipline. D3 includes student evaluations for professors in fields that are female-dominant such as nursing, psychology, and education while D4 includes student evaluations for male-dominant majors such as engineering, computer science, and philosophy 121212Fields with less than 20% gender gap are excluded. The statistics about the bachelor’s degrees earned by field and gender is obtained from Perry (2017) . The breakdown of ratings for each gender group for D2-D4 is shown in Appendix A.1.

4.2 Hyper-parameters and Training Details

Training Details: To train the model introduced in 3.1 on D1, We employ window sizes of 2, 3 and 4, and train 100 filters for each window size. For smaller datasets D2-D4. we use window sizes 2 and 3 and train 50 filters for each window size. We initialize each convolution layer using the initialization method proposed in He et al. (2015)

. We use rectified linear unit as the activation function of the convolution layer. After performing the convolution operation, we apply batch normalization 

Ioffe and Szegedy (2015)

followed by a global average-pooling operation over the feature map of each window size. Next, we concatenate the output vectors. Eventually, we run the concatenated vector through a dense layer with 64 units followed by an activation function. For decision classification and membership identification on D1, we used the softmax operation to obtain class probabilities. For D2-D4 we used rectified linear unit to obtain the output rating, and sigmoid to obtain gender class probabilities. We implement the decision classifier and member identifier networks using the Keras library 

131313https://keras.io

. We use weighted cross-entropy loss function for classification tasks and mean squared loss for regression tasks and learn the model parameters using Adam optimizer 

Kingma and Ba (2014) with a learning rate of 0.001.

For D1, we use 90% of inspections conducted from January 2010 till October 2013 (75% all records in our data-set) as our training set and the remaining 10% as our validation set. The inspections conducted from November 2013 till December 2014 are used as our test set. We set the maximum length of the arguments to the 70-th percentile of explanation lengths in our train set (18 sentences). Textual explanations that are longer than this are truncated while shorter ones are padded. For D2-D4, we randomly split our dataset to a 70-15-15 split to build our train, validation, and test sets. We set the maximum length of the arguments to the 70-th percentile of the review length in our train set (64 sentences). Reviews that are longer than this are truncated while shorter ones are padded. We set the loss weight for the decision prediction task and the membership identification task to 1. We train our multi-task network for a maximum of 25 epochs and stop the training if the decision classification loss on the validation set does not improve for 3 consecutive epochs. In the end, we revert the network’s weights to those that achieved the lowest validation loss. We repeat each experiment 5 times and report the average result. We used Nvidia Tesla K80 GPUs for our experiments.

Parameters of the attribution Model: For computing the integrated gradients for attribution, we set the number of steps in the path integral approximation from the baseline to the input instance to 50 and use Gauss–Legendre quadrature method for integral approximation. We compute the attributions of the decision classifier and the membership identification networks for the input layer.

4.3 Evaluation Metrics

In our experiments we seek to answer the following questions: (i) How does fairness-aware summarization of the input justification affect the utility of the model? (ii) Will this pre-processing step help in defending against membership identification attacks? and (iii) Will enhancing the fairness in the justification using our proposed approach also enhances the fairness in the outcome? To answer the first question, we report the utility of the decision learner. For categorical outcomes (e.g. in D1 and D1) we report the Micro-F1 and Macro-F1 and for scalar outcomes (D2-D4) we report the Mean Absolute Error(MAE). To answer the second question, we report the demographic leakage. Leakage is defined as the ability of the membership identifier network to correctly predict the protected attribute of the individuals given the justification. We report the Micro-F1 and Macro-F1 of our membership identification model. Lower demographic leakage is desirable. Lastly, to answer the question regarding the fairness of the model for categorical outcomes we report the demographic parity, equality of odds, and calibration 141414For a discussion on fairness measures and their trade-offs see Kleinberg et al. (2016) and Hardt et al. (2016).. We additionally report False Pass Rate Gap (FPRG) and False Fail Rate Gap (FFRG) across demographic groups. FPRG and FFRG represent the equality in distribution of the model errors across demographic groups. Similar metrics were used in Wadsworth et al. (2018). To measure fairness for scalar outcomes, we report Mean Absolute Error GAP. Next, we define the fairness measures in the context of food inspection.

Parity: a decision classifier satisfies demographic parity if the proportion of food establishments predicted to fail the inspection is the same for each demographic. We report the gap between the most and least favored groups. For sake of consistency with previous work, we present the protected attribute with S.

Equality of odds: for those establishments who actually failed the inspection, the proportion of failed predictions should be the same. We report the gap between the most and least favored groups. Ideally, the gap should be very close to zero.

Calibration: for those establishments who received a fail prediction, the probability of actually failing the inspection should be the same. We report the gap between the most and least favored groups. Ideally, the gap should be very close to zero.

False Pass Rate Gap(FPRG): food establishments that did not pass the inspection should have the same probability of falsely receiving a pass prediction. We report the gap between the most and least favored groups which ideally should be close to 0.

False Fail Rate Gap(FFRG): establishments of different demographic groups that did not fail the inspection should have the same probability of falsely receiving a fail prediction. We report the gap between the most and least favored groups which ideally should be close to 0.

4.4 Results and Discussion

Dataset
Utility
(Micro-F1)
Utility
(Macro-F1)
Demographic
Leakage
(Micro-F1)
Demographic
Leakage
(Macro-F1)
Empty Full FairSum Empty Full FairSum Empty Full FairSum Empty Full FairSum
D1 0.48 0.83 0.83 0.22 0.83 0.82 0.56 0.58 0.52 0.18 0.38 0.33
D1 0.48 0.8 0.8 0.26 0.79 0.79 0.78 0.66 0.56 0.66 0.48 0.36
Table 1: Results on datasets D1 and D1 (which is built by adding unfairness attacks to D1). In the "Empty" setting, justifications are removed (in D1 - the "Empty" setting includes the address in the justification). "Full" refers to the model trained on all justifications while FairSum refers to the model trained on fair summaries obtained by our proposed pre-processing approach.
Dataset Parity
Equality
of Odds
Calibration FPRG FFRG
Full FairSum Full FairSum Full FairSum Full FairSum Full FairSum
D1 0.15 0.14 0.08 0.1 0.05 0.06 0.05 0.05 0.11 0.11
D1 0.12 0.12 0.06 0.07 0.07 0.05 0.03 0.04 0.09 0.07
Table 2: Fairness metrics for datasets D1 and D1.
Dataset MAE
Demographic
Leakage
(Micro-F1)
Demographic
Leakage
(Macro-F1)
MAE Gap
Empty Full FairSum Empty Full FairSum Empty Full FairSum Empty Full FairSum
D2 0.72 0.47 0.49 0.59 0.71 0.61 0.37 0.69 0.58 0.07 0.06 0
D3 0.76 0.52 0.53 0.5 0.66 0.61 0.33 0.66 0.59 0.19 0.03 0.06
D4 0.66 0.54 0.53 0.45 0.82 0.74 0.3 0.71 0.49 0.04 0.02 0
Table 3: Results on RMP Datasets (D2-D4)

In our experiments, we compare the utility, demographic leakage, and fairness of models that are identical in terms of architecture but are trained on different versions of the training data. The model architecture is discussed in Section 3.1. In the "Empty" setting, justifications are empty. In the "Full" setting, the model is trained and tested on the original data while in the "FairSum" setting it is trained and tested on the fairly-justified summaries. We use to empty setting to indicate the lower bound of the demographic leakage. We use the full setting, to measure the bias in the justifications in the input dataset. This setting also acts as our baseline. The FairSum setting represents the effectiveness of our proposed approach in mitigating the bias in the justification. To extract the fairly-justified summaries, we train the multi-task CNN model on justifications using decision outcome as supervision for one task and the demographic information as supervision for the other. We measure the importance of each sentence in the justifications using the attribution mechanism explained in Section 3.2. Next, we include sentences that have a positive inclusion score in the fairly justified summaries. These sentences have a high utility for decision outcome prediction and a low utility for membership identification. We apply this preprocessing step on both the train and test sets. The results are presented in Tables 1, 2, and 3. For these experiments, the parameter which controls the trade-off between the utility and the demographic leakage is set to 1.

As it can be seen in Table 1, FairSum reduces the demographic leakage on dataset D1 (by 0.06 in Micro-F1 and 0.05 in Macro-F1) while achieving the same level of accuracy on the decision classification task in comparison to the full setting. As it is indicated in Table 2, our approach decreases parity by 0.01 while achieving similar results in terms of FFRG and FPRG. We also test the impact of unfairness attacks on this dataset which will be discussed in the next section. As indicated in Table 3, in dataset D2, FairSum decreases the demographic leakage from 0.71 to 0.61 Micro-F1 and 0.69 to 0.58 Macro-F1 while increasing the MAE by 0.02 in a 5-point scale. FairSum outcomes also are more fair on D2. In the full setting, predictions have 0.06 higher average MAE for females than males. While FairSum achieves similar error rates for both gender groups (0 MAE gap).

For D3 and D4, fairSum reduces the demographic leakage (from 0.66 to 0.59 and 0.71 to 0.49 Macro-F1 respectively). We conclude that our proposed approach is very effective in reducing the demographic-leakage in the input justifications (-11.3% and -19.1% decrease on Micro-F1 and Macro-F1 on average over datasets D1-D4 and D1) while also not reducing the utility of the model. Removing gender-coded language from D3 justifications comes with the cost of having 0.06 higher MAE for females than males (this was 0.03 for the full setting). On D2 and D4 however, FairSum completely closes the MAE gap between the gender groups. An example of applying FairSum on a teaching evaluation for a female professor is shown in Figure 3. As it can be seen in the figure, arguments about the looks of the instructor (more frequent for female instructors) are assigned a low inclusion score (indicated with red shade) and therefore excluded from the summary. The preserved sentences are indicated with a green shade and have a high inclusion score. Our model increases the fairness in the outcome moderately for D1, D2, and D4 but when the ground-truth rating is correlated with the protected attribute (e.g. in more biased dataset D3 with 0.07 correlation) it’s effectiveness is limited. Therefore, in addition to perturbing the justifications using a filtering-based approach more interventions is required to enhance the fairness in the outcome. This can be in form of adding justifications rather than removing or imposing the fairness constraints on the model itself rather than data.

Figure 3: An example of using FairSum on teaching evaluations for a female professor. The colors indicate the inclusion score for each sentences. Sentences with a positive score (represented with green shade) are preserved in the summary while the rest are excluded.

Utility-Fairness Trade-Off: Figure 7 shows the utility, demographic leakage, and fairness metrics as a function of on D1 and D2. Too low values of prioritize utility, selecting even relatively biased sentences and have scores close to the full setting (see Figure (a)a and (c)c). On D1, increasing generally decreases the demographic parity while increasing the FPRG (see Figure (b)b). It does not have a consistent or noticeable impact on other fairness metrics. On D2 and with near 1, the gap shrinks to 0 (See Figure (c)c). Too high values of remove too many sentences, leading to high error rate. This is because many summaries are empty with high value for

and thus, the resulting decision are unjustified (justifications are not informative about the outcomes) and unfair (the lack of justification is not uniformly distributed over genders) so the gap emerge once again. For error bars and more details on impact of

on summary length see Section A.3 and  A.2 in the Appendix.

(a)
(b)
(c)
Figure 7: Impact of on utility and fairness on datasets D1 (a and b) and D2 (c).

Unfairness Attacks: Next, we test the effectiveness of our proposed method in a scenario where unfair arguments are injected into the justifications. Chicago is one of the most segregated cities in the US Comen (2019). Thus, the address of the food establishment includes "proxy" information about the demography of the neighborhood. In this experiment, we add the address of the food establishment to the input justification of dataset D1 151515positions are randomized.. This dataset is represented with D1 in Table 1. In the empty setting for D1 the input to the model only includes the address and no other justifications for the decision. The full setting includes all the justifications as well as the address. It can be seen that the address indeed includes some proxy information about the race as the empty setting has a high demographic leakage (0.78 Micro-F1 and 0.66 Macro-F1). We also observe that fairSum effectively decreases the demographic leakage on D1 (-0.10 in Micro-F1 and -0.12 in Macro-F1 in comparison to the full setting) while achieving the same level of accuracy. FairSum has moderate improvements in terms of fairness over the full setting; decreasing the calibration and FFRG by 0.02 points.

5 Conclusion and Future Work

In this work, we propose using a train-attribute-mask pipeline for detecting and mitigating the bias in the justification of the text-based neural models. Our objective for extracting fairly-justified summaries is to maximize the utility of the output summary for the decision prediction task while minimizing the inclusion of proxy information in the summary that can reveal sensitive attributes of individuals. Our approach is not intended to enhance the fairness in the outcome but rather to enhance the fairness in the model justification. We achieve this by training a multi-task model for decision classification and membership identification. We attribute predictions of these models back to textual input attributes using an attribution mechanism called integrated gradients. Next, we incorporate the high-utility and bias-free sentences in form of a summary. Eventually, we retrain the decision classifier on the fairly-justified summaries. Our experiments on real and synthetic data sets indicate that our pipeline effectively limits the demographic leakage from the input data. In addition, it moderately enhances the fairness in the outcome. We plan to test the impact of our proposed pipeline in removing bias from the data using more data sets and on several synthetic scenarios in which a subgroup of the population is treated differently.

This material is based upon work supported by the National Science Foundation (NSF) under award number 1939743. Any opinions, findings, and conclusions in this material are those of the authors and may not reflect the views of the respective funding agency.

References

  • [1] C. R. Act (1964) Civil rights act of 1964. Title VII, Equal Employment Opportunities. Cited by: footnote 2.
  • [2] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach (2018) A reductions approach to fair classification. In ICML, Cited by: §1.
  • [3] S. Aghaei, M. J. Azizi, and P. Vayanos (2019)

    Learning optimal and fair decision trees for non-discriminative decision-making

    .
    In AAAI, Cited by: §1.
  • [4] J. Angwin, J. Larson, S. Mattu, and L. Kirchner (2016) Machine bias. ProPublica, May. Cited by: §1.
  • [5] A. M. Barry-jester, B. Casselman, and D. Goldstein (2015-08) The new science of sentencing. The Marshall Project. External Links: Link Cited by: §1.
  • [6] Y. Bechavod and K. Ligett (2017) Penalizing unfairness in binary classification. arXiv preprint arXiv:1707.00044. Cited by: §1.
  • [7] R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth (2017) A convex framework for fair regression. arXiv preprint arXiv:1706.02409. Cited by: §1.
  • [8] R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth (2018) Fairness in criminal justice risk assessments: the state of the art. Sociological Methods & Research. Cited by: §1.
  • [9] A. Bleske-Rechek and K. Michels (2010) RateMyProfessors com: testing assumptions about student use and misuse. Practical Assessment, Research, and Evaluation 15 (1), pp. 5. Cited by: §4.1.
  • [10] T. Bolukbasi, K. Chang, J. Zou, V. Saligrama, and A. Kalai (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. arXiv preprint arXiv:1607.06520. Cited by: §1.
  • [11] T. Calders, F. Kamiran, and M. Pechenizkiy (2009) Building classifiers with independency constraints. In 2009 IEEE ICDM Workshops, Cited by: §1.
  • [12] T. Calders and S. Verwer (2010)

    Three naive bayes approaches for discrimination-free classification

    .
    Data Mining and Knowledge Discovery. Cited by: §1.
  • [13] F. P. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney (2017) Optimized pre-processing for discrimination prevention. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Cited by: §1.
  • [14] D. V. Carvalho, E. M. Pereira, and J. S. Cardoso (2019) Machine learning interpretability: a survey on methods and metrics. Electronics, pp. 832. Cited by: §1.
  • [15] L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi (2019) Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency, Cited by: §1.
  • [16] L. E. Celis and V. Keswani (2019) Improved adversarial learning for fair classification. arXiv preprint arXiv:1901.10443. Cited by: §1.
  • [17] D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Céspedes, S. Yuan, C. Tar, et al. (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175. Cited by: §3.1.
  • [18] A. Chouldechova and A. Roth (2018) The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810. Cited by: §1.
  • [19] D. E. Clayson (2014) What does ratemyprofessors. com actually rate?. Assessment & Evaluation in Higher Education 39 (6), pp. 678–698. Cited by: §4.1.
  • [20] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011) Natural language processing (almost) from scratch. JMLR. Cited by: §3.1, §3.1.
  • [21] E. Comen (2019-07) Detroit, chicago, memphis the 25 most segregated cities in america. External Links: Link Cited by: §4.4.
  • [22] A. Cotter, M. Gupta, H. Jiang, N. Srebro, K. Sridharan, S. Wang, B. Woodworth, and S. You (2019) Training well-generalizing classifiers for fairness metrics and other data-dependent constraints. In ICML, Cited by: §1.
  • [23] E. Del Barrio, F. Gamboa, P. Gordaliza, and J. Loubes (2018) Obtaining fairness using optimal transport theory. arXiv preprint arXiv:1806.03195. Cited by: §1.
  • [24] M. Donini, L. Oneto, S. Ben-David, J. Shawe-Taylor, and M. Pontil (2018) Empirical risk minimization under fairness constraints. arXiv preprint arXiv:1802.08626. Cited by: §1.
  • [25] C. Dwork, N. Immorlica, A. T. Kalai, and M. Leiserson (2018) Decoupled classifiers for group-fair and efficient machine learning. In Conference on Fairness, Accountability and Transparency, Cited by: §1.
  • [26] F. Ebrahimi, M. Tushev, and A. Mahmoud (2020) Mobile app privacy in software engineering research: a systematic mapping study. Information and Software Technology. Cited by: §1.
  • [27] H. Edwards and A. Storkey (2015) Censoring representations with an adversary. arXiv preprint arXiv:1511.05897. Cited by: §1.
  • [28] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) Certifying and removing disparate impact. In ACM SIGKDD, Cited by: §1.
  • [29] B. Fish, J. Kun, and Á. D. Lelkes (2016) A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, Cited by: §1.
  • [30] S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth (2019) A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, Cited by: §1.
  • [31] A. Ghassami, S. Khodadadian, and N. Kiyavash (2018)

    Fairness in supervised learning: an information theoretic approach

    .
    In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 176–180. Cited by: §1.
  • [32] N. Goel, M. Yaghini, and B. Faltings (2018) Non-discriminatory machine learning through convex fairness criteria. In AAAI, Cited by: §1.
  • [33] M. Hardt, E. Price, and N. Srebro (2016) Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413. Cited by: §1, §1, footnote 14.
  • [34] J. He (2020) Big data set from ratemyprofessor.com for professors’ teaching evaluation. Mendeley. External Links: Document, Link Cited by: §4.1.
  • [35] K. He, X. Zhang, S. Ren, and J. Sun (2015)

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

    .
    In

    Proceedings of the IEEE international conference on computer vision

    ,
    Cited by: §4.2.
  • [36] M. Heidari and S. Rafatirad (2020) Semantic convolutional neural network model for safe business investment by using bert. In International Conference on Social Networks Analysis, Management and Security (SNAMS), Cited by: §3.1.
  • [37] M. Heidari and S. Rafatirad (2020)

    Using transfer learning approach to implement convolutional neural network model to recommend airline tickets by using online reviews

    .
    In International Workshop on Semantic and Social Media Adaptation and Personalization (SMA, Cited by: §3.1.
  • [38] K. Holstein, J. Wortman Vaughan, H. Daumé III, M. Dudik, and H. Wallach (2019) Improving fairness in machine learning systems: what do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems, Cited by: §1.
  • [39] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, Cited by: §4.2.
  • [40] F. Jafariakinabad and K. A. Hua (2019) Style-aware neural model with application in authorship attribution. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Cited by: §3.1.
  • [41] M. Jaiswal and E. M. Provost (2020) Privacy enhanced multimodal neural representations for emotion recognition. In AAAI, Cited by: §1.
  • [42] N. Kalchbrenner, E. Grefenstette, and P. Blunsom (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. Cited by: §3.1.
  • [43] F. Kamiran, T. Calders, and M. Pechenizkiy (2010) Discrimination aware decision tree learning. In 2010 IEEE ICDM, Cited by: §1.
  • [44] F. Kamiran and I. Žliobaitė (2013) Explainable and non-explainable discrimination in classification. In Discrimination and Privacy in the Information Society, Cited by: §1.
  • [45] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma (2012) Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Cited by: §1.
  • [46] E. Kazemi, M. Zadimoghaddam, and A. Karbasi (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In ICML, Cited by: §1.
  • [47] M. Keymanesh, M. Elsner, and S. Parthasarathy (2020) Toward domain-guided controllable summarization of privacy policies. Natural Legal Language Processing Workshop at KDD. Cited by: §3.1.
  • [48] M. Keymanesh, S. Gurukar, B. Boettner, C. Browning, C. Calder, and S. Parthasarathy (2020) Twitter watch: leveraging social media to monitor and predict collective-efficacy of neighborhoods. Complex Networks XI. Springer Proceedings in Complexity. Cited by: §1.
  • [49] Y. Kim (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882. Cited by: §3.1.
  • [50] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  • [51] J. Kleinberg, S. Mullainathan, and M. Raghavan (2016) Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807. Cited by: §1, §1, footnote 14.
  • [52] J. Koren (2016-09) What does that web search say about your credit?. Los Angeles Times. External Links: Link Cited by: §1.
  • [53] A. M. Legg and J. H. Wilson (2012) RateMyProfessors. com offers biased evaluations. Assessment & Evaluation in Higher Education 37 (1), pp. 89–97. Cited by: §4.1.
  • [54] D. Madras, E. Creager, T. Pitassi, and R. Zemel (2018) Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pp. 3384–3393. Cited by: §1.
  • [55] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan (2019) A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635. Cited by: §1, §1.
  • [56] A. K. Menon and R. C. Williamson (2018) The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, pp. 107–118. Cited by: §1.
  • [57] A. Olteanu, C. Castillo, F. Diaz, and E. Kıcıman (2019) Social data: biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2, pp. 13. Cited by: §1.
  • [58] O. A. Osoba and W. Welser IV (2017)

    An intelligence in our image: the risks of bias and errors in artificial intelligence

    .
    Rand Corporation. Cited by: §1.
  • [59] M. J. Perry (2017-08) Bachelors degrees by field and gender for the class of 2015. External Links: Link Cited by: footnote 12.
  • [60] D. Pessach and E. Shmueli (2020) Algorithmic fairness. arXiv preprint arXiv:2001.09784. Cited by: §1, §1.
  • [61] A. Rahmattalabi, S. Jabbari, H. Lakkaraju, P. Vayanos, E. Rice, and M. Tambe (2020) Fair influence maximization: a welfare optimization approach. arXiv preprint arXiv:2006.07906. Cited by: §1.
  • [62] L. D. Reid (2010) The role of perceived race and gender in the evaluation of college teaching on ratemyprofessors. com.. Journal of Diversity in higher Education 3 (3), pp. 137. Cited by: §4.1.
  • [63] A. S. Rosen (2018) Correlations, trends and potential biases among publicly accessible web-based student evaluations of teaching: a large-scale study of ratemyprofessors. com data. Assessment & Evaluation in Higher Education 43 (1), pp. 31–44. Cited by: §4.1.
  • [64] C. Rudin (2013) Predictive policing using machine learning to detect patterns of crime. Wired Magazine, August. Cited by: §1.
  • [65] R. Sarkhel, M. Keymanesh, A. Nandi, and S. Parthasarathy (2020) Interpretable multi-headed attention for abstractive summarization at controllable lengths. In Proceedings of the 28th International Conference on Computational Linguistics, Cited by: §1.
  • [66] E. Steel and Angwin,Julia (2010-08) On the web’s cutting edge, anonymity in name only. The Wall Street Journal. External Links: Link Cited by: §1.
  • [67] M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In International Conference on Machine Learning, pp. 3319–3328. Cited by: §3.1, §3.2, footnote 6.
  • [68] H. Suresh and J. V. Guttag (2019) A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002. Cited by: §1.
  • [69] K. C. Theyson (2015) Hot or not: the role of instructor quality and gender on the formation of positive illusions among students using ratemyprofessors. com. Practical Assessment, Research, and Evaluation 20 (1), pp. 4. Cited by: §4.1.
  • [70] M. Tushev, F. Ebrahimi, and A. Mahmoud (2020) Digital discrimination in sharing economy a requirements engineering perspective. In 2020 IEEE 28th International Requirements Engineering Conference (RE), Cited by: §1.
  • [71] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In NIPS, pp. . Cited by: §3.1.
  • [72] C. Wadsworth, F. Vera, and C. Piech (2018) Achieving fairness through adversarial learning: an application to recidivism prediction. arXiv preprint arXiv:1807.00199. Cited by: §1, §4.3.
  • [73] D. Xu, S. Yuan, L. Zhang, and X. Wu (2018)

    Fairgan: fairness-aware generative adversarial networks

    .
    In 2018 IEEE Big Data, Cited by: §1.
  • [74] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi (2017) Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In International conference on world wide web, Cited by: §1.
  • [75] M. B. Zafar, I. Valera, M. G. Rodriguez, K. P. Gummadi, and A. Weller (2017) From parity to preference-based notions of fairness in classification. arXiv preprint arXiv:1707.00010. Cited by: §1.
  • [76] M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi (2017) Fairness constraints: mechanisms for fair classification. In Artificial Intelligence and Statistics, pp. 962–970. Cited by: §1.
  • [77] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In International conference on machine learning, pp. 325–333. Cited by: §1.
  • [78] B. H. Zhang, B. Lemoine, and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In AAAI Conference on AI, Ethics, and Society, Cited by: §1.
  • [79] Y. Zhang and B. Wallace (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820. Cited by: §3.1.
  • [80] L. Zhong, Z. Zhong, Z. Zhao, S. Wang, K. D. Ashley, and M. Grabmair (2019) Automatic summarization of legal decisions using iterative masking of predictive sentences. In International Conference on Artificial Intelligence and Law, Cited by: §3.1, §3.2.

Appendix A Appendix

a.1 Dataset Statistics

Inspection reports of the city of Chicago(D1): The breakdown of the inspection results for each demographic group is shown in Table 4. Note that for the food establishments that have more violation, the inspection reports tend to be longer. In our summarization experiments, we focused on longer inspection reports which often includes establishments with higher number of violations.

Race Pass Conditional pass Fail Total inspection count
White 27.5 25.6 46.8 8339
Black 28.9 15.6 55.4 4444
Hispanic 33.8 19.2 46.8 4010
Asian 29.3 17.4 53.2 419
Table 4: The percentage of inspections for each ethnic group that received a pass, conditional pass outcome, or fail outcome.

Rate my professor (D2-D4): The rate my professor dataset only includes professor names and reviews. To infer the gender of the professors, we search for pronouns and titles commonly used for each gender161616For sake of simplicity we assume binary gender classes. If no pronouns or titles are found in the reviews, the professor’s name is used to detect their gender 171717We use https://pypi.org/project/gender-detector/ for mapping professors’ names to their gender. The breakdown of reviews written for each gender category is shown in Tables 7, 7, and 7.

[1,2] (2,3] (3,4] (4,5] Total count
Female 5.6 21.0 35.3 37.9 551
Male 3.7 21.0 35.6 39.5 783
Table 5: The percentage of instructors of each gender group in each rating class for dataset D2.
[1,2] (2,3] (3,4] (4,5] Total count
Female 4.3 22.2 31.5 41.9 279
Male 1.7 18.0 32.6 47.5 288
Table 6: The percentage of instructors of each gender group in each rating class for dataset D3.
[1,2] (2,3] (3,4] (4,5] Total count
Female 5.5 24.4 39.3 30.7 127
Male 6.3 24.6 37.9 31.03 345
Table 7: The percentage of instructors of each gender group in each rating class for dataset D4.

a.2 Impact of on summary length

Figure 8: Impact of on summary length on datasets D1-D4.

Figure 8 shows the average summary length (sentence count) for datasets D1-D4 as a function of . The food inspection reports in D1 are on average much shorter than the teaching evaluations in dataset D2 (18.2 vs 45.6 sentences). Too low values of prioritize utility by preserving even relatively biased sentences. For all datasets, the summaries start shrinking around equal to 0.85. However, for D2-D4 the compression rate is higher. Around equal to 1.25, 39.9% input justifications for dataset D1 are empty. This number is 77.8%, 95.7%, 100% for D2, D3, and D4 respectively. We conjecture that the existence of more implicit bias for D2-D4 causes the summaries to shrink faster by increasing . At this point (1.25 and higher) the resulting decisions are unjustified (justifications are not informative about the outcomes). Therefore in Figure 7 we only show impact of changing from 0.8 to 1.2.

a.3 Results (Error Bars):

Figure 11 and 14 indicate the errors in utility and membership prediction over 5 runs for datasets D1 and D2. For the FairSum setting the parameter that controls the utility-discrimination trade-off is set to 1.

(a)
(b)
Figure 11: Error bars for utility (a) and demographic leakage(b) for dataset D1. for the FairSum setting is set to 1.
(a)
(b)
Figure 14: Error bars for MAE (a) and demographic leakage(b) for dataset D2. for the FairSum setting is set to 1.