SoK: Privacy-Preserving Data Synthesis

07/05/2023
by   Yuzheng Hu, et al.
0

As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of private information. This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field. Specifically, we put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods. Under the master recipe, we further dissect the statistical methods into choices of modeling and representation, and investigate the DL-based methods by different generative modeling principles. To consolidate our findings, we provide comprehensive reference tables, distill key takeaways, and identify open problems in the existing literature. In doing so, we aim to answer the following questions: What are the design principles behind different PPDS methods? How can we categorize these methods, and what are the advantages and disadvantages associated with each category? Can we provide guidelines for method selection in different real-world scenarios? We proceed to benchmark several prominent DL-based methods on the task of private image synthesis and conclude that DP-MERF is an all-purpose approach. Finally, upon systematizing the work over the past decade, we identify future directions and call for actions from researchers.

READ FULL TEXT

page 10

page 11

research
05/20/2022

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

Deep learning (DL) models for natural language processing (NLP) tasks of...
research
03/27/2023

Privacy-preserving machine learning for healthcare: open challenges and future perspectives

Machine Learning (ML) has recently shown tremendous success in modeling ...
research
11/18/2018

Privacy Preserving Utility Mining: A Survey

In big data era, the collected data usually contains rich information an...
research
02/07/2022

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation

Deep learning often requires a large amount of data. In real-world appli...
research
02/14/2022

Characterizing Differentially-Private Techniques in the Era of Internet-of-Vehicles

Recent developments of advanced Human-Vehicle Interactions rely on the c...
research
10/26/2017

Privacy Preserving Internet Browsers: Forensic Analysis of Browzar

With the advance of technology, Criminal Justice agencies are being conf...
research
11/08/2022

Privacy Meets Explainability: A Comprehensive Impact Benchmark

Since the mid-10s, the era of Deep Learning (DL) has continued to this d...

Please sign up or login with your details

Forgot password? Click here to reset