DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

10/25/2021
by   Boris van Breugel, et al.
12

Machine learning models have been criticized for reflecting unfair biases in the training data. Instead of solving for this by introducing fair learning algorithms directly, we focus on generating fair synthetic data, such that any downstream learner is fair. Generating fair synthetic data from unfair data - while remaining truthful to the underlying data-generating process (DGP) - is non-trivial. In this paper, we introduce DECAF: a GAN-based fair synthetic data generator for tabular data. With DECAF we embed the DGP explicitly as a structural causal model in the input layers of the generator, allowing each variable to be reconstructed conditioned on its causal parents. This procedure enables inference time debiasing, where biased edges can be strategically removed for satisfying user-defined fairness requirements. The DECAF framework is versatile and compatible with several popular definitions of fairness. In our experiments, we show that DECAF successfully removes undesired bias and - in contrast to existing methods - is capable of generating high-quality synthetic data. Furthermore, we provide theoretical guarantees on the generator's convergence and the fairness of downstream models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Representative Fair Synthetic Data

Algorithms learn rules and associations based on the training data that ...
research
09/13/2022

Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation

Machine learning applications are becoming increasingly pervasive in our...
research
12/20/2022

PreFair: Privately Generating Justifiably Fair Synthetic Data

When a database is protected by Differential Privacy (DP), its usability...
research
06/30/2023

FFPDG: Fast, Fair and Private Data Generation

Generative modeling has been used frequently in synthetic data generatio...
research
07/07/2023

Programmable Synthetic Tabular Data Generation

Large amounts of tabular data remain underutilized due to privacy, data ...
research
02/13/2023

Provable Detection of Propagating Sampling Bias in Prediction Models

With an increased focus on incorporating fairness in machine learning mo...
research
03/16/2021

RAWLSNET: Altering Bayesian Networks to Encode Rawlsian Fair Equality of Opportunity

We present RAWLSNET, a system for altering Bayesian Network (BN) models ...

Please sign up or login with your details

Forgot password? Click here to reset