Distributed k-Clustering for Data with Heavy Noise

10/18/2018
by   Shi Li, et al.
0

In this paper, we consider the k-center/median/means clustering with outliers problems (or the (k, z)-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on z, the number of outliers. Recently Guha et al. overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with 2z outliers. For the case where z is large, the extra z outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible (1+ϵ)z, while maintaining the O(1)-approximation ratio and independence of communication cost on z. The problems we consider include the (k, z)-center problem, and (k, z)-median/means problems in Euclidean metrics. Implementation of the our algorithm for (k, z)-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2022

Near-optimal Coresets for Robust Clustering

We consider robust clustering problems in ℝ^d, specifically k-clustering...
research
07/02/2020

Adapting k-means algorithms for outliers

This paper shows how to adapt several simple and classical sampling-base...
research
06/03/2013

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two po...
research
05/24/2018

A Practical Algorithm for Distributed Clustering and Outlier Detection

We study the classic k-means/median clustering, which are fundamental pr...
research
02/15/2023

Fully dynamic clustering and diversity maximization in doubling metrics

We present approximation algorithms for some variants of center-based cl...
research
09/02/2020

Structural Iterative Rounding for Generalized k-Median Problems

This paper considers approximation algorithms for generalized k-median p...
research
04/25/2018

Bi-criteria Approximation Algorithms for Minimum Enclosing Ball and k-Center Clustering with Outliers

Motivated by the arising realistic issues in big data, the problem of Mi...

Please sign up or login with your details

Forgot password? Click here to reset