Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

03/03/2023
by   Yilin Yang, et al.
0

Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers the using the features of neural tangent kernels (NTKs), more precisely empirical NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.

READ FULL TEXT

page 6

page 7

page 8

research
04/16/2020

Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Recent advances in generating synthetic data that allow to add principle...
research
11/28/2019

Comparative Study of Differentially Private Synthetic Data Algorithms and Evaluation Standards

Differentially private synthetic data generation is becoming a popular s...
research
06/09/2021

Polynomial magic! Hermite polynomials for private data generation

Kernel mean embedding is a useful tool to compare probability measures. ...
research
02/26/2020

Differentially Private Mean Embeddings with Random Features (DP-MERF) for Simple Practical Synthetic Data Generation

We present a differentially private data generation paradigm using rando...
research
09/03/2021

Privacy of synthetic data: a statistical framework

Privacy-preserving data analysis is emerging as a challenging problem wi...
research
12/09/2021

Differentially Private Ensemble Classifiers for Data Streams

Learning from continuous data streams via classification/regression is p...
research
07/12/2022

dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

We propose a general, flexible, and scalable framework dpart, an open so...

Please sign up or login with your details

Forgot password? Click here to reset