Production of Categorical Data Verifying Differential Privacy: Conception and Applications to Machine Learning

by   Héber H. Arcolezi, et al.

Private and public organizations regularly collect and analyze digitalized data about their associates, volunteers, clients, etc. However, because most personal data are sensitive, there is a key challenge in designing privacy-preserving systems. To tackle privacy concerns, research communities have proposed different methods to preserve privacy, with Differential privacy (DP) standing out as a formal definition that allows quantifying the privacy-utility trade-off. Besides, with the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server. The objective of this thesis is thus two-fold: O_1) To improve the utility and privacy in multiple frequency estimates under LDP guarantees, which is fundamental to statistical learning. And O_2) To assess the privacy-utility trade-off of machine learning (ML) models trained over differentially private data. For O_1, we first tackled the problem from two "multiple" perspectives, i.e., multiple attributes and multiple collections throughout time, while focusing on utility. Secondly, we focused our attention on the multiple attributes aspect only, in which we proposed a solution focusing on privacy while preserving utility. In both cases, we demonstrate through analytical and experimental validations the advantages of our proposed solutions over state-of-the-art LDP protocols. For O_2, we empirically evaluated ML-based solutions designed to solve real-world problems while ensuring DP guarantees. Indeed, we mainly used the input data perturbation setting from the privacy-preserving ML literature. This is the situation in which the whole dataset is sanitized independently and, thus, we implemented LDP algorithms from the perspective of the centralized data owner. In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.


page 1

page 2

page 3

page 4


Automatic Discovery of Privacy-Utility Pareto Fronts

Differential privacy is a mathematical framework for privacy-preserving ...

Distributed Learning with Curious and Adversarial Machines

The ubiquity of distributed machine learning (ML) in sensitive public do...

Exploring Machine Learning Privacy/Utility trade-off from a hyperparameters Lens

Machine Learning (ML) architectures have been applied to several applica...

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

The availability of large amounts of informative data is crucial for suc...

AirMixML: Over-the-Air Data Mixup for Inherently Privacy-Preserving Edge Machine Learning

Wireless channels can be inherently privacy-preserving by distorting the...

Differentially Private Empirical Risk Minimization

Privacy-preserving machine learning algorithms are crucial for the incre...

Characterizing Differentially-Private Techniques in the Era of Internet-of-Vehicles

Recent developments of advanced Human-Vehicle Interactions rely on the c...

Please sign up or login with your details

Forgot password? Click here to reset