Oversampling Higher-Performing Minorities During Machine Learning Model Training Reduces Adverse Impact Slightly but Also Reduces Model Accuracy

04/27/2023
by   Louis Hickman, et al.
0

Organizations are increasingly adopting machine learning (ML) for personnel assessment. However, concerns exist about fairness in designing and implementing ML assessments. Supervised ML models are trained to model patterns in data, meaning ML models tend to yield predictions that reflect subgroup differences in applicant attributes in the training data, regardless of the underlying cause of subgroup differences. In this study, we systematically under- and oversampled minority (Black and Hispanic) applicants to manipulate adverse impact ratios in training data and investigated how training data adverse impact ratios affect ML model adverse impact and accuracy. We used self-reports and interview transcripts from job applicants (N = 2,501) to train 9,702 ML models to predict screening decisions. We found that training data adverse impact related linearly to ML model adverse impact. However, removing adverse impact from training data only slightly reduced ML model adverse impact and tended to negatively affect ML model accuracy. We observed consistent effects across self-reports and interview transcripts, whether oversampling real (i.e., bootstrapping) or synthetic observations. As our study relied on limited predictor sets from one organization, the observed effects on adverse impact may be attenuated among more accurate ML models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

Algorithmic Factors Influencing Bias in Machine Learning

It is fair to say that many of the prominent examples of bias in Machine...
research
12/25/2021

Explainable Artificial Intelligence for Pharmacovigilance: What Features Are Important When Predicting Adverse Outcomes?

Explainable Artificial Intelligence (XAI) has been identified as a viabl...
research
06/08/2021

Supervised Machine Learning with Plausible Deniability

We study the question of how well machine learning (ML) models trained o...
research
06/02/2020

Local Interpretability of Calibrated Prediction Models: A Case of Type 2 Diabetes Mellitus Screening Test

Machine Learning (ML) models are often complex and difficult to interpre...
research
08/18/2023

Attesting Distributional Properties of Training Data for Machine Learning

The success of machine learning (ML) has been accompanied by increased c...
research
04/20/2019

CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]

It is widely recognized that the data quality affects machine learning (...
research
04/07/2023

AI Model Disgorgement: Methods and Choices

Responsible use of data is an indispensable part of any machine learning...

Please sign up or login with your details

Forgot password? Click here to reset