Scalable Differentially Private Clustering via Hierarchically Separated Trees

06/17/2022
by   Vincent Cohen-Addad, et al.
0

We study the private k-median and k-means clustering problem in d dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most O(d^3/2log n)· OPT + O(k d^2 log^2 n / ϵ^2), where ϵ is the privacy guarantee. (The dimension term, d, can be replaced with O(log k) using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, Õ(nkd), time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime. Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.

READ FULL TEXT
research
12/27/2021

Differentially-Private Sublinear-Time Clustering

Clustering is an essential primitive in unsupervised machine learning. W...
research
12/29/2021

Differentially-Private Clustering of Easy Instances

Clustering is a fundamental problem in data analysis. In differentially ...
research
08/18/2020

Differentially Private Clustering: Tight Approximation Ratios

We study the task of differentially private clustering. For several basi...
research
07/14/2023

Differentially Private Clustering in Data Streams

The streaming model is an abstraction of computing over massive data str...
research
07/02/2020

Private Optimization Without Constraint Violations

We study the problem of differentially private optimization with linear ...
research
05/01/2022

The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation: From Coresets to Dimension Reduction

We study the effect of Johnson-Lindenstrauss transforms in various Eucli...
research
06/07/2019

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

We introduce and analyze two parameter-free linear-memory tree search al...

Please sign up or login with your details

Forgot password? Click here to reset