Privacy for Free: How does Dataset Condensation Help Privacy?

06/01/2022
by   Tian Dong, et al.
9

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy, and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n (n ≫ m) raw samples by DC. We also empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.

READ FULL TEXT

page 9

page 14

research
09/29/2022

No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy"

New methods designed to preserve data privacy require careful scrutiny. ...
research
03/16/2021

The Influence of Dropout on Membership Inference in Differentially Private Models

Differentially private models seek to protect the privacy of data the mo...
research
06/22/2020

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

How can we release a massive volume of sensitive data while mitigating p...
research
03/13/2022

One Parameter Defense – Defending against Data Inference Attacks via Differential Privacy

Machine learning models are vulnerable to data inference attacks, such a...
research
08/26/2022

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Recently, data collaboration (DC) analysis has been developed for privac...
research
01/29/2021

ADePT: Auto-encoder based Differentially Private Text Transformation

Privacy is an important concern when building statistical models on data...
research
03/27/2019

Differential Privacy of Aggregated DC Optimal Power Flow Data

We consider the problem of privately releasing aggregated network statis...

Please sign up or login with your details

Forgot password? Click here to reset