Faster Balanced Clusterings in High Dimension

09/04/2018
by   Hu Ding, et al.
0

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced k-center, k-median, and k-means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our idea is based on a novel spatial partition in geometry. For the balanced k-center clustering, we provide a 4-approximation algorithm that improves the existing approximation factor 7; for the balanced k-median and k-means clusterings, our algorithms yield constant and (1+ϵ)-approximation factors with any ϵ>0. More importantly, our algorithms achieve linear or nearly linear running times when k is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of n-point metric.

READ FULL TEXT
research
10/30/2018

Coresets for k-Means and k-Median Clustering and their Applications

In this paper, we show the existence of small coresets for the problem...
research
10/02/2019

Streaming Balanced Clustering

Clustering of data points in metric space is among the most fundamental ...
research
10/27/2021

Tight FPT Approximation for Constrained k-Center and k-Supplier

In this work, we study a range of constrained versions of the k-supplier...
research
05/12/2023

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

We consider the well-studied Robust (k, z)-Clustering problem, which gen...
research
12/18/2017

Don't Rock the Boat: Algorithms for Balanced Dynamic Loading and Unloading

We consider dynamic loading and unloading problems for heavy geometric o...
research
11/03/2022

Connected k-Center and k-Diameter Clustering

Motivated by an application from geodesy, we introduce a novel clusterin...
research
01/07/2023

Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

In this paper, we study the problem of k-center clustering with outliers...

Please sign up or login with your details

Forgot password? Click here to reset