Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

10/13/2022
by   Mayana Pereira, et al.
0

Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation. This severely limits the applicability, as much of the valuable data in the world is locked up in silos, controlled by entities who cannot show their data to each other or a central aggregator without raising privacy concerns. To overcome this roadblock, we propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation. Data holders send shares to servers who perform Secure Multiparty Computation (MPC) computations while the original data stays encrypted. We instantiate this idea in an MPC protocol for the Multiplicative Weights with Exponential Mechanism (MWEM) algorithm to generate synthetic data based on real data originating from many data holders without reliance on a single point of failure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

Benchmarking Differentially Private Synthetic Data Generation Algorithms

This work presents a systematic benchmark of differentially private synt...
research
08/11/2021

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

We propose a general approach for differentially private synthetic data ...
research
06/19/2023

Differentially Private Synthetic Data Using KD-Trees

Creation of a synthetic dataset that faithfully represents the data dist...
research
12/22/2020

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Deep learning models have demonstrated superior performance in several a...
research
06/30/2023

FFPDG: Fast, Fair and Private Data Generation

Generative modeling has been used frequently in synthetic data generatio...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...
research
11/13/2021

HydraGAN A Multi-head, Multi-objective Approach to Synthetic Data Generation

Synthetic data generation overcomes limitations of real-world machine le...

Please sign up or login with your details

Forgot password? Click here to reset