Fairness implications of encoding protected categorical attributes

01/27/2022
by   Carlos Mougan, et al.
5

Protected attributes are often presented as categorical features that need to be encoded before feeding them into a machine learning algorithm. Encoding these attributes is paramount as they determine the way the algorithm will learn from the data. Categorical feature encoding has a direct impact on the model performance and fairness. In this work, we compare the accuracy and fairness implications of the two most well-known encoders: one-hot encoding and target encoding. We distinguish between two types of induced bias that can arise while using these encodings and can lead to unfair models. The first type, irreducible bias, is due to direct group category discrimination and a second type, reducible bias, is due to large variance in less statistically represented groups. We take a deeper look into how regularization methods for target encoding can improve the induced bias while encoding categorical features. Furthermore, we tackle the problem of intersectional fairness that arises when mixing two protected categorical features leading to higher cardinality. This practice is a powerful feature engineering technique used for boosting model performance. We study its implications on fairness as it can increase both types of induced bias

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2021

xFAIR: Better Fairness via Model-based Rebalancing of Protected Attributes

Machine learning software can generate models that inappropriately discr...
research
07/25/2023

An Empirical Study on Fairness Improvement with Multiple Protected Attributes

Existing research mostly improves the fairness of Machine Learning (ML) ...
research
12/22/2021

Evaluating categorical encoding methods on a real credit card fraud detection database

Correctly dealing with categorical data in a supervised learning context...
research
10/25/2022

Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings

In this paper, we introduce the Vehicle Claims dataset, consisting of fr...
research
06/01/2020

Sampling Techniques in Bayesian Target Encoding

Target encoding is an effective encoding technique of categorical variab...
research
04/30/2019

Encoding Categorical Variables with Conjugate Bayesian Models for WeWork Lead Scoring Engine

Applied Data Scientists throughout various industries are commonly faced...
research
05/29/2020

Quasi-orthonormal Encoding for Machine Learning Applications

Most machine learning models, especially artificial neural networks, req...

Please sign up or login with your details

Forgot password? Click here to reset