Real-world K-Anonymity Applications: the KGen approach and its evaluation in Fraudulent Transactions

04/01/2022
by   Daniel De Pascale, et al.
0

K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not able to work with a large number of attributes in a "Big" dataset, i.e., a dataset drawn from Big Data. To address this significant shortcoming, we introduce and evaluate KGen an approach to K-anonymity featuring Genetic Algorithms. KGen promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. KGen allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions catered for small datasets, KGen works properly also over Big datasets while still providing a good-enough solution. Evaluation results show how our approach can still work efficiently on a real world dataset, provided by Dutch Tax Authority, with 47 attributes (i.e., the columns of the dataset to be anonymized) and over 1.5K+ observations (i.e., the rows of that dataset), as well as on a dataset with 97 attributes and over 3942 observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2020

Big Data and model-based survey sampling

Big Data are huge amounts of digital information that are automatically ...
research
08/17/2016

Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

Datasets with a mixture of numerical and categorical attributes are rout...
research
06/14/2021

z-anonymity: Zero-Delay Anonymization for Data Streams

With the advent of big data and the birth of the data markets that sell ...
research
04/18/2019

Ontology-based Design of Experiments on Big Data Solutions

Big data solutions are designed to cope with data of huge Volume and wid...
research
11/30/2022

Learning Agile Paths from Optimal Control

Efficient motion planning algorithms are of central importance for deplo...
research
02/20/2020

Meta-learning for mixed linear regression

In modern supervised learning, there are a large number of tasks, but ma...
research
03/27/2020

Sorting Big Data by Revealed Preference with Application to College Ranking

When ranking big data observations such as colleges in the United States...

Please sign up or login with your details

Forgot password? Click here to reset