Utility-efficient Differentially Private K-means Clustering based on Cluster Merging

10/03/2020
by   Tianjiao Ni, et al.
0

Differential privacy is widely used in data analysis. State-of-the-art k-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private k-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and merging clusters. Specifically, to obtain k clusters with differential privacy, the algorithm first generates n × k initial centroids, adds adaptive noise for each iteration to get n × k clusters, and finally merges these clusters into k ones. We theoretically prove the differential privacy of the proposed algorithm. Surprisingly, extensive experimental results show that: 1) cluster merging with equal amounts of noise improves the utility somewhat; 2) although adding adaptive noise only does not improve the utility, combining both cluster merging and adaptive noise further improves the utility significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2023

k-Means SubClustering: A Differentially Private Algorithm with Improved Clustering Quality

In today's data-driven world, the sensitivity of information has been a ...
research
09/15/2021

DPGen: Automated Program Synthesis for Differential Privacy

Differential privacy has become a de facto standard for releasing data i...
research
03/10/2018

Graph-based Clustering under Differential Privacy

In this paper, we present the first differentially private clustering me...
research
02/08/2019

Achieving Data Utility-Privacy Tradeoff in Internet of Medical Things: A Machine Learning Approach

The emergence and rapid development of the Internet of Medical Things (I...
research
09/18/2019

VideoDP: A Universal Platform for Video Analytics with Differential Privacy

Massive amounts of video data are ubiquitously generated in personal dev...
research
02/03/2020

Differentially Private k-Means Clustering with Guaranteed Convergence

Iterative clustering algorithms help us to learn the insights behind the...
research
08/24/2022

DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

A large amount of high-dimensional and heterogeneous data appear in prac...

Please sign up or login with your details

Forgot password? Click here to reset