Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

01/24/2019
by   Hu Ding, et al.
0

We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez's algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods.

READ FULL TEXT
research
01/07/2023

Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

In this paper, we study the problem of k-center clustering with outliers...
research
01/24/2019

Greedy Strategy Works for Clustering with Outliers and Coresets Construction

We study the problems of clustering with outliers in high dimension. Tho...
research
02/27/2020

On Metric DBSCAN with Low Doubling Dimension

The density based clustering method Density-Based Spatial Clustering of ...
research
05/24/2019

A Practical Framework for Solving Center-Based Clustering with Outliers

Clustering has many important applications in computer science, but real...
research
02/27/2020

The Effectiveness of Johnson-Lindenstrauss Transform for High Dimensional Optimization with Outliers

Johnson-Lindenstrauss (JL) Transform is one of the most popular methods ...
research
10/08/2017

Discovery of Paradigm Dependencies

Missing and incorrect values often cause serious consequences. To deal w...
research
02/27/2020

A Data-Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimensions

In this paper, we consider the following query problem: given two weight...

Please sign up or login with your details

Forgot password? Click here to reset