Confidence-Ranked Reconstruction of Census Microdata from Published Statistics

by   Travis Dick, et al.

A reconstruction attack on a private dataset D takes as input some publicly accessible information about the dataset and produces a list of candidate elements of D. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of D from aggregate query statistics Q(D)∈ℝ^m, but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset D was sampled, demonstrating that they are exploiting information in the aggregate statistics Q(D), and not simply the overall structure of the distribution. In other words, the queries Q(D) are permitting reconstruction of elements of this dataset, not the distribution from which D was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.


page 1

page 2

page 3

page 4


Private Synthetic Data with Hierarchical Structure

We study the problem of differentially private synthetic data generation...

Private, Fair, and Verifiable Aggregate Statistics for Mobile Crowdsensing in Blockchain Era

In this paper, we propose FairCrowd, a private, fair, and verifiable fra...

Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods

With vast databases at their disposal, private tech companies can compet...

Dataset Distillation Fixes Dataset Reconstruction Attacks

Modern deep learning requires large volumes of data, which could contain...

Database Reconstruction from Noisy Volumes: A Cache Side-Channel Attack on SQLite

We demonstrate the feasibility of database reconstruction under a cache ...

Differential Private Stream Processing of Energy Consumption

A number of applications benefit from continuously releasing streams of ...

Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano

Differential privacy (DP) is by far the most widely accepted framework f...

Please sign up or login with your details

Forgot password? Click here to reset