Privacy-Preserving Data Synthetisation for Secure Information Sharing

12/01/2022
by   Tânia Carvalho, et al.
0

We can protect user data privacy via many approaches, such as statistical transformation or generative models. However, each of them has critical drawbacks. On the one hand, creating a transformed data set using conventional techniques is highly time-consuming. On the other hand, in addition to long training phases, recent deep learning-based solutions require significant computational resources. In this paper, we propose PrivateSMOTE, a technique designed for competitive effectiveness in protecting cases at maximum risk of re-identification while requiring much less time and computational resources. It works by synthetic data generation via interpolation to obfuscate high-risk cases while minimizing data utility loss of the original data. Compared to multiple conventional and state-of-the-art privacy-preservation methods on 20 data sets, PrivateSMOTE demonstrates competitive results in re-identification risk. Also, it presents similar or higher predictive performance than the baselines, including generative adversarial networks and variational autoencoders, reducing their energy consumption and time requirements by a minimum factor of 9 and 12, respectively.

READ FULL TEXT
research
01/13/2022

Towards a Data Privacy-Predictive Performance Trade-off

Machine learning is increasingly used in the most diverse applications a...
research
01/20/2022

Survey on Privacy-Preserving Techniques for Data Publishing

The exponential growth of collected, processed, and shared microdata has...
research
02/18/2021

Composable Generative Models

Generative modeling has recently seen many exciting developments with th...
research
05/11/2023

Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

To address increasing societal concerns regarding privacy and climate, t...
research
04/22/2023

Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

Synthetic data has been hailed as the silver bullet for privacy preservi...
research
06/03/2016

Using Neural Generative Models to Release Synthetic Twitter Corpora with Reduced Stylometric Identifiability of Users

We present a method for generating synthetic versions of Twitter data us...
research
10/28/2021

Vulnerability Characterization and Privacy Quantification for Cyber-Physical Systems

Cyber-physical systems (CPS) data privacy protection during sharing, agg...

Please sign up or login with your details

Forgot password? Click here to reset