Generating private data with user customization

12/02/2020
by   Xiao Chen, et al.
8

Personal devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We want to reduce the correlation between user-specific private information and the data while retaining the useful information. Rather than training a large model to achieve privatization from end to end, we first decouple the creation of a latent representation, and then privatize the data that allows user-specific privatization to occur in a setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data that remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on user specified preferences regarding the private and utility information. The small filter is trained via a GAN-type robust optimization that can take place on a distributed device such as a phone or tablet. Under special conditions of our linear filter, we disclose the connections between our generative approach and renyi differential privacy. We conduct experiments on multiple datasets including MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach.

READ FULL TEXT

page 4

page 8

page 24

page 26

page 27

research
04/20/2019

Distributed generation of privacy preserving data with user customization

Distributed devices such as mobile phones can produce and store large am...
research
02/13/2019

Privacy-Utility Trade-off of Linear Regression under Random Projections and Additive Noise

Data privacy is an important concern in machine learning, and is fundame...
research
05/22/2023

EXACT: Extensive Attack for Split Learning

Privacy-Preserving machine learning (PPML) can help us train and deploy ...
research
05/25/2023

Differentially Private Latent Diffusion Models

Diffusion models (DMs) are widely used for generating high-quality image...
research
02/16/2021

Active Privacy-utility Trade-off Against a Hypothesis Testing Adversary

We consider a user releasing her data containing some personal informati...
research
02/12/2019

Contrastive Variational Autoencoder Enhances Salient Features

Variational autoencoders are powerful algorithms for identifying dominan...

Please sign up or login with your details

Forgot password? Click here to reset