Diversifying Anonymized Data with Diversity Constraints

07/17/2020
by   Mostafa Milani, et al.
0

Recently introduced privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared to third parties. Much of this real data is not only sensitive requiring anonymization, but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints and study their foundations such as implication and satisfiability. We show that determining the existence of a diverse, anonymized instance can be done in PTIME, and we present a clustering-based algorithm. We conduct extensive experiments using real and synthetic data showing the effectiveness of our techniques, and improvement over existing baselines. Our work aligns with recent trends towards responsible data science by coupling diversity with privacy-preserving data publishing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2019

In Defense of Synthetic Data

Synthetic datasets have long been thought of as second-rate, to be used ...
research
08/14/2020

Towards Querying in Decentralized Environments with Privacy-Preserving Aggregation

The Web is a ubiquitous economic, educational, and collaborative space. ...
research
08/13/2018

Review of Different Privacy Preserving Techniques in PPDP

Big data is a term used for a very large data sets that have many diffic...
research
11/20/2020

Survey and Open Problems in Privacy Preserving Knowledge Graph: Merging, Query, Representation, Completion and Applications

Knowledge Graph (KG) has attracted more and more companies' attention fo...
research
08/02/2018

Diversification on Big Data in Query Processing

Recently, in the area of big data, some popular applications such as web...
research
06/04/2019

Balanced Ranking with Diversity Constraints

Many set selection and ranking algorithms have recently been enhanced wi...
research
01/25/2022

Niching-based Evolutionary Diversity Optimization for the Traveling Salesperson Problem

In this work, we consider the problem of finding a set of tours to a tra...

Please sign up or login with your details

Forgot password? Click here to reset