Towards a Data Privacy-Predictive Performance Trade-off

01/13/2022
by   Tânia Carvalho, et al.
22

Machine learning is increasingly used in the most diverse applications and domains, whether in healthcare, to predict pathologies, or in the financial sector to detect fraud. One of the linchpins for efficiency and accuracy in machine learning is data utility. However, when it contains personal information, full access may be restricted due to laws and regulations aiming to protect individuals' privacy. Therefore, data owners must ensure that any data shared guarantees such privacy. Removal or transformation of private information (de-identification) are among the most common techniques. Intuitively, one can anticipate that reducing detail or distorting information would result in losses for model predictive performance. However, previous work concerning classification tasks using de-identified data generally demonstrates that predictive performance can be preserved in specific applications. In this paper, we aim to evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks. We leverage a large set of privacy-preserving techniques and learning algorithms to provide an assessment of re-identification ability and the impact of transformed variants on predictive performance. Unlike previous literature, we confirm that the higher the level of privacy (lower re-identification risk), the higher the impact on predictive performance, pointing towards clear evidence of a trade-off.

READ FULL TEXT

page 21

page 22

page 24

page 26

research
01/20/2022

Survey on Privacy-Preserving Techniques for Data Publishing

The exponential growth of collected, processed, and shared microdata has...
research
12/01/2022

Privacy-Preserving Data Synthetisation for Secure Information Sharing

We can protect user data privacy via many approaches, such as statistica...
research
06/27/2023

A Three-Way Knot: Privacy, Fairness, and Predictive Performance Dynamics

As the frontier of machine learning applications moves further into huma...
research
12/01/2009

Differentially Private Empirical Risk Minimization

Privacy-preserving machine learning algorithms are crucial for the incre...
research
11/29/2021

Conceptually Diverse Base Model Selection for Meta-Learners in Concept Drifting Data Streams

Meta-learners and ensembles aim to combine a set of relevant yet diverse...
research
02/06/2018

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks

Cross-institutional healthcare predictive modeling can accelerate resear...
research
06/02/2023

Resource-Efficient Federated Hyperdimensional Computing

In conventional federated hyperdimensional computing (HDC), training lar...

Please sign up or login with your details

Forgot password? Click here to reset