DeepAI
Log In Sign Up

A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

02/08/2021
by   Tom Hess, et al.
11

We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give a new algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order. This is the first such algorithm that holds without any assumptions on the input data and selects a non-trivial number of centers. The number of selected centers is quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and repeated calculations using an offline clustering algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/30/2019

Sequential no-Substitution k-Median-Clustering

We study the sample-based k-median clustering objective under a sequenti...
11/13/2020

Consistent k-Clustering for General Metrics

Given a stream of points in a metric space, is it possible to maintain a...
04/01/2020

k-Median clustering under discrete Fréchet and Hausdorff distances

We give the first near-linear time (1+)-approximation algorithm for k-me...
12/28/2020

No-substitution k-means Clustering with Adversarial Order

We investigate k-means clustering in the online no-substitution setting ...
08/09/2019

Unexpected Effects of Online K-means Clustering

In this paper we study k-means clustering in the online setting. In the ...
09/23/2018

Improved constant approximation factor algorithms for k-center problem for uncertain data

In real applications, database systems should be able to manage and proc...
03/11/2019

Coresets for Ordered Weighted Clustering

We design coresets for Ordered k-Median, a generalization of classical c...