Spending Privacy Budget Fairly and Wisely

04/27/2022
by   Lucas Rosenblatt, et al.
0

Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises

Diferentially private (DP) synthetic datasets are a powerful approach fo...
research
05/09/2022

Evaluating the Fairness Impact of Differentially Private Synthetic Data

Differentially private (DP) synthetic data is a promising approach to ma...
research
05/18/2023

Understanding how Differentially Private Generative Models Spend their Privacy Budget

Generative models trained with Differential Privacy (DP) are increasingl...
research
05/28/2022

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

While generation of synthetic data under differential privacy (DP) has r...
research
02/26/2020

Differentially Private Mean Embeddings with Random Features (DP-MERF) for Simple Practical Synthetic Data Generation

We present a differentially private data generation paradigm using rando...
research
05/10/2021

Transitioning from Real to Synthetic data: Quantifying the bias in model

With the advent of generative modeling techniques, synthetic data and it...

Please sign up or login with your details

Forgot password? Click here to reset