Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

07/13/2021
by   March Boedihardjo, et al.
0

The protection of private information is of vital importance in data-driven research, business, and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are k-anonymity (often implemented via microaggregation) and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees, and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy, and synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Privacy-preserving data sharing via probabilistic modelling

Differential privacy allows quantifying privacy loss from computations o...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...
research
06/27/2023

A New Mathematical Optimization-Based Method for the m-invariance Problem

The issue of ensuring privacy for users who share their personal informa...
research
07/10/2020

New Oracle-Efficient Algorithms for Private Synthetic Data Release

We present three new algorithms for constructing differentially private ...
research
04/20/2022

Private measures, random walks, and synthetic data

Differential privacy is a mathematical concept that provides an informat...
research
01/06/2023

Covariance loss, Szemeredi regularity, and differential privacy

We show how randomized rounding based on Grothendieck's identity can be ...
research
05/08/2019

Reconstruction of Privacy-Sensitive Data from Protected Templates

In this paper, we address the problem of data reconstruction from privac...

Please sign up or login with your details

Forgot password? Click here to reset