The Impact of Data Preparation on the Fairness of Software Systems

10/05/2019
by   Inês Valentim, et al.
7

Machine learning models are widely adopted in scenarios that directly affect people. The development of software systems based on these models raises societal and legal concerns, as their decisions may lead to the unfair treatment of individuals based on attributes like race or gender. Data preparation is key in any machine learning pipeline, but its effect on fairness is yet to be studied in detail. In this paper, we evaluate how the fairness and effectiveness of the learned models are affected by the removal of the sensitive attribute, the encoding of the categorical attributes, and instance selection methods (including cross-validators and random undersampling). We used the Adult Income and the German Credit Data datasets, which are widely studied and known to have fairness concerns. We applied each data preparation technique individually to analyse the difference in predictive performance and fairness, using statistical parity difference, disparate impact, and the normalised prejudice index. The results show that fairness is affected by transformations made to the training data, particularly in imbalanced datasets. Removing the sensitive attribute is insufficient to eliminate all the unfairness in the predictions, as expected, but it is key to achieve fairer models. Additionally, the standard random undersampling with respect to the true labels is sometimes more prejudicial than performing no random undersampling.

READ FULL TEXT

page 1

page 8

page 9

research
07/25/2022

Estimating and Controlling for Fairness via Sensitive Attribute Predictors

Although machine learning classifiers have been increasingly used in hig...
research
02/02/2023

Hyper-parameter Tuning for Fair Classification without Sensitive Attribute Access

Fair machine learning methods seek to train models that balance model pe...
research
02/04/2022

Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute Inference Attacks

Machine learning (ML) models have been deployed for high-stakes applicat...
research
05/16/2019

Fairness in Machine Learning with Tractable Models

Machine Learning techniques have become pervasive across a range of diff...
research
12/12/2019

Awareness in Practice: Tensions in Access to Sensitive Attribute Data for Antidiscrimination

Organizations cannot address demographic disparities that they cannot se...
research
11/11/2019

Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness

Current adoption of machine learning in industrial, societal and economi...
research
06/30/2022

Discrimination in machine learning algorithms

Machine learning algorithms are routinely used for business decisions th...

Please sign up or login with your details

Forgot password? Click here to reset