Sensitivity Sampling Over Dynamic Geometric Data Streams with Applications to k-Clustering

02/01/2018
by   Zhao Song, et al.
0

Sensitivity based sampling is crucial for constructing nearly-optimal coreset for k-means / median clustering. In this paper, we provide a novel data structure that enables sensitivity sampling over a dynamic data stream, where points from a high dimensional discrete Euclidean space can be either inserted or deleted. Based on this data structure, we provide a one-pass coreset construction for k-means O(kpoly(d)) over d-dimensional geometric dynamic data streams. While previous best known result is only for k-median [Braverman, Frahling, Lang, Sohler, Yang' 17], which cannot be directly generalized to k-means to obtain algorithms with space nearly linear in k. To the best of our knowledge, our algorithm is the first dynamic geometric data stream algorithm for k-means using space polynomial in dimension and nearly optimal in k. We further show that our data structure for maintaining coreset can be extended as a unified approach for a more general classes of k-clustering, including k-median, M-estimator clustering, and clusterings with a more general set of cost functions over distances. For all these tasks, the space/time of our algorithm is similar to k-means with only poly(d) factor difference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Coresets for k-Means and k-Median Clustering and their Applications

In this paper, we show the existence of small coresets for the problem...
research
10/15/2022

A Nearly Optimal Size Coreset Algorithm with Nearly Linear Time

A coreset is a point set containing information about geometric properti...
research
04/05/2022

Streaming Facility Location in High Dimension via New Geometric Hashing

In Euclidean Uniform Facility Location, the input is a set of clients in...
research
04/14/2020

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Given a collection of n points in ℝ^d, the goal of the (k,z)-clustering ...
research
11/08/2018

Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Consider an instance of Euclidean k-means or k-medians clustering. We sh...
research
10/02/2018

A Unified Framework for Clustering Constrained Data without Locality Property

In this paper, we consider a class of constrained clustering problems of...
research
11/09/2020

Streaming Algorithms for Geometric Steiner Forest

We consider a natural generalization of the Steiner tree problem, the St...

Please sign up or login with your details

Forgot password? Click here to reset