Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

01/07/2023
by   Hu Ding, et al.
0

In this paper, we study the problem of k-center clustering with outliers. The problem has many important applications in real world, but the presence of outliers can significantly increase the computational complexity. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez's algorithm, that was developed for solving the ordinary k-center clustering problem. Based on some novel observations, we show that a simple randomized version of this greedy strategy actually can handle outliers efficiently. We further show that this randomized greedy approach also yields small coreset for the problem in doubling metrics (even if the doubling dimension is not given), which can greatly reduce the computational complexity. Moreover, together with the partial clustering framework proposed in arXiv:1703.01539 , we prove that our coreset method can be applied to distributed data with a low communication complexity. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower complexities comparing with the existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

We study the problem of k-center clustering with outliers in arbitrary m...
research
01/24/2019

Greedy Strategy Works for Clustering with Outliers and Coresets Construction

We study the problems of clustering with outliers in high dimension. Tho...
research
05/24/2019

A Practical Framework for Solving Center-Based Clustering with Outliers

Clustering has many important applications in computer science, but real...
research
02/28/2021

Is Simple Uniform Sampling Efficient for Center-Based Clustering With Outliers: When and Why?

Clustering has many important applications in computer science, but real...
research
04/25/2018

Bi-criteria Approximation Algorithms for Minimum Enclosing Ball and k-Center Clustering with Outliers

Motivated by the arising realistic issues in big data, the problem of Mi...
research
09/04/2018

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attentio...
research
06/03/2013

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two po...

Please sign up or login with your details

Forgot password? Click here to reset