Differentially Private Mean Embeddings with Random Features (DP-MERF) for Simple Practical Synthetic Data Generation

02/26/2020
by   Frederik Harder, et al.
0

We present a differentially private data generation paradigm using random feature representations of kernel mean embeddings when comparing the distribution of true data with that of synthetic data. We exploit the random feature representations for two important benefits. First, we require a very low privacy cost for training deep generative models. This is because unlike kernel-based distance metrics that require computing the kernel matrix on all pairs of true and synthetic data points, we can detach the data-dependent term from the term solely dependent on synthetic data. Hence, we need to perturb the data-dependent term once-for-all and then use it until the end of the generator training. Second, we can obtain an analytic sensitivity of the kernel mean embedding as the random features are norm bounded by construction. This removes the necessity of hyperparameter search for a clipping norm to handle the unknown sensitivity of an encoder network when dealing with high-dimensional data. We provide several variants of our algorithm, differentially private mean embeddings with random features (DP-MERF) to generate (a) heterogeneous tabular data, (b) input features and corresponding labels jointly; and (c) high-dimensional data. Our algorithm achieves better privacy-utility trade-offs than existing methods tested on several datasets.

READ FULL TEXT
research
06/19/2023

Differentially Private Synthetic Data Using KD-Trees

Creation of a synthetic dataset that faithfully represents the data dist...
research
06/09/2021

Polynomial magic! Hermite polynomials for private data generation

Kernel mean embedding is a useful tool to compare probability measures. ...
research
05/25/2022

Differentially Private Data Generation Needs Better Features

Training even moderately-sized generative models with differentially-pri...
research
07/04/2023

Fast Private Kernel Density Estimation via Locality Sensitive Quantization

We study efficient mechanisms for differentially private kernel density ...
research
03/03/2023

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Maximum mean discrepancy (MMD) is a particularly useful distance metric ...
research
11/07/2022

Private Set Generation with Discriminative Information

Differentially private data generation techniques have become a promisin...
research
04/27/2022

Spending Privacy Budget Fairly and Wisely

Differentially private (DP) synthetic data generation is a practical met...

Please sign up or login with your details

Forgot password? Click here to reset