Optimal Pre-Processing to Achieve Fairness and Its Relationship with Total Variation Barycenter

01/18/2021
by   Farhad Farokhi, et al.
0

We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing. Hence, the problem of finding the optimal pre-processing regiment for enforcing fairness can be cast as minimizing total variations distance between the distribution of the data before and after pre-processing subject to a constraint on the total variation distance between the distribution of the inputs given protected attributes. This problem is a linear program that can be efficiently solved. We show that this problem is intimately related to finding the barycenter (i.e., center of mass) of two distributions when distances in the probability space are measured by total variation distance. We also investigate the effect of differential privacy on fairness using the proposed the total variation distances. We demonstrate the results using numerical experimentation with a practice dataset.

READ FULL TEXT

page 1

page 4

research
10/19/2018

The total variation distance between high-dimensional Gaussians

We prove a lower bound and an upper bound for the total variation distan...
research
07/22/2018

An Intersectional Definition of Fairness

We introduce a measure of fairness for algorithms and data with regard t...
research
06/05/2013

Multiclass Total Variation Clustering

Ideas from the image processing literature have recently motivated a new...
research
07/24/2021

On the Le Cam distance between multivariate hypergeometric and multivariate normal experiments

In this short note, we develop a local approximation for the log-ratio o...
research
09/25/2009

Discrete MDL Predicts in Total Variation

The Minimum Description Length (MDL) principle selects the model that ha...
research
09/01/2022

Fair mapping

To mitigate the effects of undesired biases in models, several approache...
research
06/05/2019

Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework

One reason for the emergence of bias in AI systems is biased data -- dat...

Please sign up or login with your details

Forgot password? Click here to reset