Real-world K-Anonymity Applications: the KGen approach and its evaluation in Fraudulent Transactions
K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not able to work with a large number of attributes in a "Big" dataset, i.e., a dataset drawn from Big Data. To address this significant shortcoming, we introduce and evaluate KGen an approach to K-anonymity featuring Genetic Algorithms. KGen promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. KGen allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions catered for small datasets, KGen works properly also over Big datasets while still providing a good-enough solution. Evaluation results show how our approach can still work efficiently on a real world dataset, provided by Dutch Tax Authority, with 47 attributes (i.e., the columns of the dataset to be anonymized) and over 1.5K+ observations (i.e., the rows of that dataset), as well as on a dataset with 97 attributes and over 3942 observations.
READ FULL TEXT