k-Center Clustering with Outliers in the Sliding-Window Model

09/24/2021
by   Mark de Berg, et al.
0

The k-center problem for a point set P asks for a collection of k congruent balls (that is, balls of equal radius) that together cover all the points in P and whose radius is minimized. The k-center problem with outliers is defined similarly, except that z of the points in P do need not to be covered, for a given parameter z. We study the k-center problem with outliers in data streams in the sliding-window model. In this model we are given a possibly infinite stream P=⟨ p_1,p_2,p_3,…⟩ of points and a time window of length W, and we want to maintain a small sketch of the set P(t) of points currently in the window such that using the sketch we can approximately solve the problem on P(t). We present the first algorithm for the k-center problem with outliers in the sliding-window model. The algorithm works for the case where the points come from a space of bounded doubling dimension and it maintains a set S(t) such that an optimal solution on S(t) gives a (1+ε)-approximate solution on P(t). The algorithm is deterministic and uses O((kz/ε^d)logσ) storage, where d is the doubling dimension of the underlying space and σ is the spread of the points in the stream. Algorithms providing a (1+ε)-approximation were not even known in the setting without outliers or in the insertion-only setting with outliers. We also present a lower bound showing that any algorithm that provides a (1+ε)-approximation must use Ω((kz/ε)logσ) storage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

k-Center Clustering with Outliers in the MPC and Streaming Model

Given a point set P ⊆ X of size n in a metric space (X,dist) of doubling...
research
01/07/2022

k-Center Clustering with Outliers in Sliding Windows

Metric k-center clustering is a fundamental unsupervised learning primit...
research
05/10/2020

Approaching Optimal Duplicate Detection in a Sliding Window

Duplicate detection is the problem of identifying whether a given item h...
research
10/06/2019

Fast Detection of Outliers in Data Streams with the Q_n Estimator

We present FQN (Fast Q_n), a novel algorithm for fast detection of outli...
research
01/04/2020

Computing Euclidean k-Center over Sliding Windows

In the Euclidean k-center problem in sliding window model, input points ...
research
11/07/2017

SWOOP: Top-k Similarity Joins over Set Streams

We provide efficient support for applications that aim to continuously f...
research
07/03/2023

A numerical algorithm for attaining the Chebyshev bound in optimal learning

Given a compact subset of a Banach space, the Chebyshev center problem c...

Please sign up or login with your details

Forgot password? Click here to reset