STatistical Election to Partition Sequentially (STEPS) and Its Application in Differentially Private Release and Analysis of Youth Voter Registration Data

03/18/2018
by   Claire McKay Bowen, et al.
0

Voter data is important in political science research and applications such as improving youth voter turnout. Privacy protection is imperative in voter data since it often contains sensitive individual information. Differential privacy (DP) formalizes privacy in probabilistic terms and provides a robust concept for privacy protection. DIfferentially Private Data Synthesis (DIPS) techniques produce synthetic data in the DP setting. However, statistical efficiency of the synthetic data via DIPS can be low due to the potentially large amount of noise injected to satisfy DP, especially in high-dimensional data. We propose a new DIPS approach STatistical Election to Partition Sequentially (STEPS) that sequentially partitions data by attributes per their differentiability of the data variability. Additionally, we propose a metric SPECKS that effectively assesses the similarity of synthetic data to the actual data. The application of the STEPS procedure on the 2000-2012 Current Population Survey youth voter data suggests STEPS is easy to implement and better preserves the original information than some DIPS approaches including the Laplace mechanism on the full cross-tabulation of the data and the hierarchical histograms generated via random partitioning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity

We propose a method for the release of differentially private synthetic ...
research
08/11/2021

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

We propose a general approach for differentially private synthetic data ...
research
01/21/2023

Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms

Marginal-based methods achieve promising performance in the synthetic da...
research
07/29/2021

HTF: Homogeneous Tree Framework for Differentially-Private Release of Location Data

Mobile apps that use location data are pervasive, spanning domains such ...
research
05/28/2022

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

While generation of synthetic data under differential privacy (DP) has r...
research
09/05/2023

Differentially Private Synthetic Heavy-tailed Data

The U.S. Census Longitudinal Business Database (LBD) product contains em...
research
04/27/2022

Spending Privacy Budget Fairly and Wisely

Differentially private (DP) synthetic data generation is a practical met...

Please sign up or login with your details

Forgot password? Click here to reset