DPM: Clustering Sensitive Data through Separation

07/06/2023
by   Yara Schütt, et al.
0

Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set D, if there is a wide low-density separator in the central 60% quantile, DPM finds that separator with probability 1 - exp(-√(|D|)). Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for ε = 1 and δ=10^-5 DPM improves upon the difference to the baseline by up to 50% for a synthetic data set and by up to 62% for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2021

Differentially-Private Clustering of Easy Instances

Clustering is a fundamental problem in data analysis. In differentially ...
research
02/08/2019

Achieving Data Utility-Privacy Tradeoff in Internet of Medical Things: A Machine Learning Approach

The emergence and rapid development of the Internet of Medical Things (I...
research
08/27/2020

Differentially Private Clustering via Maximum Coverage

This paper studies the problem of clustering in metric spaces while pres...
research
06/15/2020

Towards practical differentially private causal graph discovery

Causal graph discovery refers to the process of discovering causal relat...
research
01/20/2019

A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel

For the privacy funnel (PF) problem, we propose an efficient iterative a...
research
04/27/2023

Improving the Utility of Differentially Private Clustering through Dynamical Processing

This study aims to alleviate the trade-off between utility and privacy i...
research
08/10/2022

Local Differentially Private Fuzzy Counting in Stream Data using Probabilistic Data Structure

Privacy-preserving estimation of counts of items in streaming data finds...

Please sign up or login with your details

Forgot password? Click here to reset