Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

by   Lucas Rosenblatt, et al.

Differential privacy mechanisms are increasingly used to enable public release of sensitive datasets, relying on strong theoretical guarantees for privacy coupled with empirical evidence of utility. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, multivariate correlations, or classification accuracy. In this paper, we propose an alternative evaluation methodology for measuring the utility of differentially private synthetic data in scientific research, a measure we term "epistemic parity." Our methodology consists of reproducing empirical conclusions of peer-reviewed papers that use publicly available datasets, and comparing these conclusions to those based on differentially private versions of the datasets. We instantiate our methodology over a benchmark of recent peer-reviewed papers that analyze public datasets in the ICPSR social science repository. We reproduce visualizations (qualitative results) and statistical measures (quantitative results) from each paper. We then generate differentially private synthetic datasets using state-of-the-art mechanisms and assess whether the conclusions stated in the paper hold. We find that, across reasonable epsilon values, epistemic parity only partially holds for each synthesizer we evaluated. Therefore, we advocate for both improving existing synthesizers and creating new data release mechanisms that offer strong guarantees for epistemic parity while achieving risk-aware, best effort protection from privacy attacks.


Differentially Private Streaming Data Release under Temporal Correlations via Post-processing

The release of differentially private streaming data has been extensivel...

Differentially Private Genomic Data Release For GWAS Reproducibility

With the rapid development of technology in genome-related fields, resea...

Optimizing Random Mixup with Gaussian Differential Privacy

Differentially private data release receives rising attention in machine...

Generating Poisson-Distributed Differentially Private Synthetic Data

The dissemination of synthetic data can be an effective means of making ...

Private Algorithms with Private Predictions

When applying differential privacy to sensitive data, a common way of ge...

Differentially Private Database Release via Kernel Mean Embeddings

We lay theoretical foundations for new database release mechanisms that ...

Please sign up or login with your details

Forgot password? Click here to reset