Bayesian Pseudo Posterior Synthesis for Data Privacy Protection

01/19/2019
by   Jingchen Hu, et al.
0

Statistical agencies utilize models to synthesize respondent-level data for release to the general public as an alternative to the actual data records. A Bayesian model synthesizer encodes privacy protection by employing a hierarchical prior construction that induces smoothing of the real data distribution. Synthetic respondent-level data records are often preferred to summary data tables due to the many possible uses by researchers and data analysts. Agencies balance a trade-off between utility of the synthetic data versus disclosure risks and hold a specific target threshold for disclosure risk before releasing synthetic datasets. We introduce a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight in (0, 1), defined to be inversely proportional to the disclosure risk for that record in the synthetic data. Our use of a vector of weights allows more precise downweighting of high risk records in a fashion that better preserves utility as compared with using a scalar weight. We illustrate our method with a simulation study and an application to the Consumer Expenditure Survey of the U.S. Bureau of Labor Statistics. We demonstrate how the frequentist consistency and uncertainty quantification are affected by the inverse risk-weighting.

READ FULL TEXT
research
08/20/2019

Risk-Efficient Bayesian Data Synthesis for Privacy Protection

High-utility and low-risks synthetic data facilitates microdata dissemin...
research
09/25/2019

Bayesian Pseudo Posterior Mechanism under Differential Privacy

We propose a Bayesian pseudo posterior mechanism to generate record-leve...
research
01/15/2021

Private Tabular Survey Data Products through Synthetic Microdata Generation

We propose three synthetic microdata approaches to generate private tabu...
research
05/22/2022

Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

The large number of publicly available survey datasets of wide variety, ...
research
09/26/2018

Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

The release of synthetic data generated from a model estimated on the da...
research
06/01/2020

Re-weighting of Vector-weighted Mechanisms for Utility Maximization under Differential Privacy

We implement a pseudo posterior synthesizer for microdata dissemination ...
research
06/07/2019

Increasing Transparent and Accountable Use of Data by Quantifying the Actual Privacy Risk in Interactive Record Linkage

Record linkage refers to the task of integrating data from two or more d...

Please sign up or login with your details

Forgot password? Click here to reset