CIPHER: Construction of dIfferentially Private microdata from low-dimensional Histograms via solving linear Equations with Tikhonov Regularization

by   Evercita C. Eugenio, et al.

When government agencies, research institutes, industries release individual-level data for research and public use, the data is often perturbed in certain ways to provide some level of privacy protection. The recently developed differentially private data synthesis (DIPS) methods are built upon the concept of differential privacy and provides a strong mathematical privacy guarantee while aiming to maintain the statistical utility of the released sanitized data. We introduce a new DIPS algorithm, CIPHER, which generates differential private individual-level data from a set of low-dimensional histograms via solving a set of linear equations with the Tikhonov (l_2) regularization. CIPHER is conceptually very simple and requires nothing than decomposing joint probabilities via basic provability rules to construct the equation set and subsequently solving linear equations. CIPHER also has the ability to automatically "correct" for the inconsistency arising from the differential private sanitization among the histograms that share at least one common attributes. We compare CIPHER with the MWEM (multiplicative weighting via exponential mechanism) and the full-dimensional histogram (FDH) sanitization through simulation and case studies. The results demonstrate that CIPHER made significance improvements over MWEM in statistical inferences, both of which aim to generate differentially private synthetic individual-level data from a set of low-dimensional histograms. CIPHER also delivered similar performance as the FDH sanitation for most of the examined privacy budget range.


page 14

page 19


Learning Differentially Private Mechanisms

Differential privacy is a formal, mathematical definition of data privac...

A Joint Exponential Mechanism For Differentially Private Top-k

We present a differentially private algorithm for releasing the sequence...

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

The privacy leakage of the model about the training data can be bounded ...

U.S. Broadband Coverage Data Set: A Differentially Private Data Release

Broadband connectivity is a key metric in today's economy. In an era of ...

Assessing Statistical Disclosure Risk for Differentially Private, Hierarchical Count Data, with Application to the 2020 U.S. Decennial Census

We propose Bayesian methods to assess the statistical disclosure risk of...

Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops

Conventional origin-destination (OD) matrices record the count of trips ...

Please sign up or login with your details

Forgot password? Click here to reset