A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel

by   Ni Ding, et al.

For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data X correlated with the sensitive information S, the PF problem is to generate the sanitized data X̂ that maintains a specified utility/fidelity threshold on I(X; X̂) while minimizing the privacy leakage I(S; X̂). Our IAC-MDSF algorithm starts with the original alphabet X̂ := X and iteratively merges the elements in the current alphabet X̂ that minimizes the Lagrangian function I(S;X̂) - λ I(X;X̂) . We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of X̂ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;X̂) vs. - I(X;X̂). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.


page 1

page 2

page 3

page 4


Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

We consider a non-stochastic privacy-preserving problem in which an adve...

An efficient clustering algorithm from the measure of local Gaussian distribution

In this paper, I will introduce a fast and novel clustering algorithm ba...

DPM: Clustering Sensitive Data through Separation

Privacy-preserving clustering groups data points in an unsupervised mann...

Probabilistic spatial clustering based on the Self Discipline Learning (SDL) model of autonomous learning

Unsupervised clustering algorithm can effectively reduce the dimension o...

Achieving Data Utility-Privacy Tradeoff in Internet of Medical Things: A Machine Learning Approach

The emergence and rapid development of the Internet of Medical Things (I...

On Properties and Optimization of Information-theoretic Privacy Watchdog

We study the problem of privacy preservation in data sharing, where S is...

A Formal Perspective on Byte-Pair Encoding

Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data...

Please sign up or login with your details

Forgot password? Click here to reset