Differentially Private Synthetic Heavy-tailed Data

09/05/2023
by   Tran Tran, et al.
0

The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll information of all U.S. establishments and firms dating back to 1976 and is an invaluable resource for economic research. However, the sensitive information in LBD requires confidentiality measures that the U.S. Census in part addressed by releasing a synthetic version (SynLBD) of the data to protect firms' privacy while ensuring its usability for research activities, but without provable privacy guarantees. In this paper, we propose using the framework of differential privacy (DP) that offers strong provable privacy protection against arbitrary adversaries to generate synthetic heavy-tailed data with a formal privacy guarantee while preserving high levels of utility. We propose using the K-Norm Gradient Mechanism (KNG) with quantile regression for DP synthetic data generation. The proposed methodology offers the flexibility of the well-known exponential mechanism while adding less noise. We propose implementing KNG in a stepwise and sandwich order, such that new quantile estimation relies on previously sampled quantiles, to more efficiently use the privacy-loss budget. Generating synthetic heavy-tailed data with a formal privacy guarantee while preserving high levels of utility is a challenging problem for data curators and researchers. However, we show that the proposed methods can achieve better data utility relative to the original KNG at the same privacy-loss budget through a simulation study and an application to the Synthetic Longitudinal Business Database.

READ FULL TEXT

page 18

page 19

page 27

page 32

research
05/23/2018

pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity

We propose a method for the release of differentially private synthetic ...
research
09/19/2023

DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance

Recent years have witnessed the adoption of differential privacy (DP) in...
research
03/02/2023

GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces

In this paper we focus on the problem of generating high-quality, privat...
research
08/26/2017

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in d...
research
09/12/2023

Chained-DP: Can We Recycle Privacy Budget?

Privacy-preserving vector mean estimation is a crucial primitive in fede...
research
02/19/2021

Obfuscation of Images via Differential Privacy: From Facial Images to General Images

Due to the pervasiveness of image capturing devices in every-day life, i...

Please sign up or login with your details

Forgot password? Click here to reset