PrivGen: Preserving Privacy of Sequences Through Data Generation

02/23/2020
by   Sigal Shaked, et al.
0

Sequential data is everywhere, and it can serve as a basis for research that will lead to improved processes. For example, road infrastructure can be improved by identifying bottlenecks in GPS data, or early diagnosis can be improved by analyzing patterns of disease progression in medical data. The main obstacle is that access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy, and rightly so. Anonymizing sequence data is not a simple task, since a user creates an almost unique signature over time. Existing anonymization methods reduce the quality of information in order to maintain the level of anonymity required. Damage to quality may disrupt patterns that appear in the original data and impair the preservation of various characteristics. Since in many cases the researcher does not need the data as is and instead is only interested in the patterns that exist in the data, we propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data. We demonstrate that the data generation mechanism significantly limits the risk of privacy infringement. Evaluating our method with real-world datasets shows that its generated data preserves many characteristics of the data, including the sequential model, as trained based on the source data. This suggests that the data generated by our method could be used in place of actual data for various types of analysis, maintaining user privacy and the data's integrity at the same time.

READ FULL TEXT

page 7

page 8

page 10

page 13

page 14

page 15

page 16

research
01/02/2019

Improving Suppression to Reduce Disclosure Risk and Enhance Data Utility

In Privacy Preserving Data Publishing, various privacy models have been ...
research
04/23/2023

Diffusion Model for GPS Trajectory Generation

With the deployment of GPS-enabled devices and data acquisition technolo...
research
01/25/2023

Huff-DP: Huffman Coding based Differential Privacy Mechanism for Real-Time Data

With the advancements in connected devices, a huge amount of real-time d...
research
06/30/2023

FFPDG: Fast, Fair and Private Data Generation

Generative modeling has been used frequently in synthetic data generatio...
research
01/31/2019

AnomiGAN: Generative adversarial networks for anonymizing private medical data

Typical personal medical data contains sensitive information about indiv...
research
09/01/2022

Authentication, Authorization, and Selective Disclosure for IoT data sharing using Verifiable Credentials and Zero-Knowledge Proofs

As IoT becomes omnipresent vast amounts of data are generated, which can...
research
09/05/2022

How Much User Context Do We Need? Privacy by Design in Mental Health NLP Application

Clinical NLP tasks such as mental health assessment from text, must take...

Please sign up or login with your details

Forgot password? Click here to reset