Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

05/11/2023
by   Pepijn de Reus, et al.
0

To address increasing societal concerns regarding privacy and climate, the EU adopted the General Data Protection Regulation (GDPR) and committed to the Green Deal. Considerable research studied the energy efficiency of software and the accuracy of machine learning models trained on anonymised data sets. Recent work began exploring the impact of privacy-enhancing techniques (PET) on both the energy consumption and accuracy of the machine learning models, focusing on k-anonymity. As synthetic data is becoming an increasingly popular PET, this paper analyses the energy consumption and accuracy of two phases: a) applying privacy-enhancing techniques to the concerned data set, b) training the models on the concerned privacy-enhanced data set. We use two privacy-enhancing techniques: k-anonymisation (using generalisation and suppression) and synthetic data, and three machine-learning models. Each model is trained on each privacy-enhanced data set. Our results show that models trained on k-anonymised data consume less energy than models trained on the original data, with a similar performance regarding accuracy. Models trained on synthetic data have a similar energy consumption and a similar to lower accuracy compared to models trained on the original data.

READ FULL TEXT
research
04/03/2023

Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models

Modern machine learning models have started to consume incredible amount...
research
09/23/2021

Robin Hood and Matthew Effects – Differential Privacy Has Disparate Impact on Synthetic Data

Generative models trained using Differential Privacy (DP) are increasing...
research
09/12/2022

On the Energy Consumption of Different Dataframe Processing Libraries – An Exploratory Study

Background: The energy consumption of machine learning and its impact on...
research
03/07/2023

Training Machine Learning Models to Characterize Temporal Evolution of Disadvantaged Communities

Disadvantaged communities (DAC), as defined by the Justice40 initiative ...
research
12/01/2022

Privacy-Preserving Data Synthetisation for Secure Information Sharing

We can protect user data privacy via many approaches, such as statistica...
research
02/09/2021

k-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers

The protection of private information is a crucial issue in data-driven ...
research
10/07/2016

Distributed Averaging CNN-ELM for Big Data

Increasing the scalability of machine learning to handle big volume of d...

Please sign up or login with your details

Forgot password? Click here to reset