Online Coresets for Clustering with Bregman Divergences

by   Rachit Chhaya, et al.

We present algorithms that create coresets in an online setting for clustering problems according to a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the lightweight coresets Bachem et. al. 2018, and take update time O(d) for every incoming point where d is dimension of the point. Our first algorithm gives online coresets of size Õ((k,d,ϵ,μ)) for k-clusterings according to any μ-similar Bregman divergence. We further extend this algorithm to show existence of a non-parametric coresets, where the coreset size is independent of k, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets are larger by a factor of O(log n) (n is number of points) and have similar (small) additive guarantee. At the same time our coresets also function as lightweight coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are also significantly smaller (scaling with O(log n) as opposed to O(d^d) for points in R̃^d) than the (relative-error) coresets obtained in Bachem et. al. 2015 for DP-Means. While our non-parametric coresets are existential, we give an algorithmic version under certain assumptions.


page 1

page 2

page 3

page 4


Non-parametric Archimedean generator estimation with implications for multiple testing

In multiple testing, the family-wise error rate can be bounded under som...

Online Spectral Approximation in Random Order Streams

This paper studies spectral approximation for a positive semidefinite ma...

Reusing Preconditioners in Projection based Model Order Reduction Algorithms

Dynamical systems are pervasive in almost all engineering and scientific...

Locally Private k-Means Clustering with Constant Multiplicative Approximation and Near-Optimal Additive Error

Given a data set of size n in d'-dimensional Euclidean space, the k-mean...

Non-parametric sparse additive auto-regressive network models

Consider a multi-variate time series (X_t)_t=0^T where X_t ∈R^d which ma...

Higher-order interactions in statistical physics and machine learning: A non-parametric solution to the inverse problem

We propose a model-independent definition of n-point interaction within ...

Coresets for Clustering with Missing Values

We provide the first coreset for clustering points in ℝ^d that have mult...

Please sign up or login with your details

Forgot password? Click here to reset