A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel

01/20/2019
by   Ni Ding, et al.
0

For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data X correlated with the sensitive information S, the PF problem is to generate the sanitized data X̂ that maintains a specified utility/fidelity threshold on I(X; X̂) while minimizing the privacy leakage I(S; X̂). Our IAC-MDSF algorithm starts with the original alphabet X̂ := X and iteratively merges the elements in the current alphabet X̂ that minimizes the Lagrangian function I(S;X̂) - λ I(X;X̂) . We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of X̂ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;X̂) vs. - I(X;X̂). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2019

Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

We consider a non-stochastic privacy-preserving problem in which an adve...
research
09/13/2017

An efficient clustering algorithm from the measure of local Gaussian distribution

In this paper, I will introduce a fast and novel clustering algorithm ba...
research
07/06/2023

DPM: Clustering Sensitive Data through Separation

Privacy-preserving clustering groups data points in an unsupervised mann...
research
01/07/2022

Probabilistic spatial clustering based on the Self Discipline Learning (SDL) model of autonomous learning

Unsupervised clustering algorithm can effectively reduce the dimension o...
research
02/08/2019

Achieving Data Utility-Privacy Tradeoff in Internet of Medical Things: A Machine Learning Approach

The emergence and rapid development of the Internet of Medical Things (I...
research
10/19/2020

On Properties and Optimization of Information-theoretic Privacy Watchdog

We study the problem of privacy preservation in data sharing, where S is...
research
06/29/2023

A Formal Perspective on Byte-Pair Encoding

Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data...

Please sign up or login with your details

Forgot password? Click here to reset