Private Tabular Survey Data Products through Synthetic Microdata Generation

01/15/2021
by   Jingchen Hu, et al.
0

We propose three synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a disclosure risk based-weighted pseudo posterior mechanism to survey data with a focus on producing tabular products under a formal privacy guarantee. Two of our approaches synthesize the observed sample distribution of the outcome and survey weights, jointly, such that both quantities together possess a probabilistic differential privacy guarantee. The privacy-protected outcome and sampling weights are used to construct tabular cell estimates and associated standard errors to correct for survey sampling bias. The third approach synthesizes the population distribution from the observed sample under a pseudo posterior construction that treats survey sampling weights as fixed to correct the sample likelihood to approximate that for the population. Each by-record sampling weight in the pseudo posterior is, in turn, multiplied by the associated privacy, risk-based weight for that record to create a composite pseudo posterior mechanism that both corrects for survey bias and provides a privacy guarantee for the observed sample. Through a simulation study and a real data application to the Survey of Doctorate Recipients public use file, we demonstrate that our three microdata synthesis approaches to construct tabular products provide superior utility preservation as compared to the additive-noise approach of the Laplace Mechanism. Moreover, all our approaches allow the release of microdata to the public, enabling additional analyses at no extra privacy cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

Bayesian Pseudo Posterior Mechanism under Differential Privacy

We propose a Bayesian pseudo posterior mechanism to generate record-leve...
research
01/19/2019

Bayesian Pseudo Posterior Synthesis for Data Privacy Protection

Statistical agencies utilize models to synthesize respondent-level data ...
research
05/10/2022

Mechanisms for Global Differential Privacy under Bayesian Data Synthesis

This paper introduces a new method that embeds any Bayesian model used t...
research
09/09/2022

Impacts of Census Differential Privacy for Small-Area Disease Mapping to Monitor Health Inequities

US Census Bureau (USCB) has implemented a new privacy-preserving disclos...
research
09/17/2023

Fully Synthetic Data for Complex Surveys

When seeking to release public use files for confidential data, statisti...
research
06/01/2020

Re-weighting of Vector-weighted Mechanisms for Utility Maximization under Differential Privacy

We implement a pseudo posterior synthesizer for microdata dissemination ...
research
05/24/2022

Releasing survey microdata with exact cluster locations and additional privacy safeguards

Household survey programs around the world publish fine-granular georefe...

Please sign up or login with your details

Forgot password? Click here to reset