Differentially Private Clustering in Data Streams

07/14/2023
by   Alessandro Epasto, et al.
0

The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as k-means and k-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for k-means and k-median clustering of d-dimensional Euclidean data points over a stream with length at most T using poly(k,d,log(T)) space to achieve a constant multiplicative error and a poly(k,d,log(T)) additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset algorithm as a blackbox. By plugging in existing DP coreset results via Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a (1+γ)-multiplicative approximation with Õ_γ(poly(k,d,log(T))) space for any γ>0, and the additive error is poly(k,d,log(T)) or (2) an O(1)-multiplicative approximation with Õ(k · poly(d,log(T))) space and poly(k,d,log(T)) additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2023

Differentially Private Continual Releases of Streaming Frequency Moment Estimations

The streaming model of computation is a popular approach for working wit...
research
10/02/2019

Streaming Balanced Clustering

Clustering of data points in metric space is among the most fundamental ...
research
04/03/2020

Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is...
research
03/31/2023

Differentially Private Stream Processing at Scale

We design, to the best of our knowledge, the first differentially privat...
research
06/17/2022

Scalable Differentially Private Clustering via Hierarchically Separated Trees

We study the private k-median and k-means clustering problem in d dimens...
research
04/08/2022

High-Dimensional Geometric Streaming in Polynomial Space

Many existing algorithms for streaming geometric data analysis have been...

Please sign up or login with your details

Forgot password? Click here to reset