A Unified Framework for Clustering Constrained Data without Locality Property

10/02/2018
by   Hu Ding, et al.
0

In this paper, we consider a class of constrained clustering problems of points in R^d, where d could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called Peeling-and-Enclosing (PnE), to iteratively solve two variants of the constrained clustering problems, constrained k-means clustering (k-CMeans) and constrained k-median clustering (k-CMedian). Our framework is based on two standalone geometric techniques, called Simplex Lemma and Weaker Simplex Lemma, for k-CMeans and k-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If k and 1/ϵ are fixed numbers, our framework generates, in nearly linear time ( i.e., O(n( n)^k+1d)), O(( n)^k) k-tuple candidates for the k mean or median points, and one of them induces a (1+ϵ)-approximation for k-CMeans or k-CMedian, where n is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best k-tuple candidate), we obtain a (1+ϵ)-approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Coresets for k-Means and k-Median Clustering and their Applications

In this paper, we show the existence of small coresets for the problem...
research
07/23/2020

FPT Approximation for Constrained Metric k-Median/Means

The Metric k-median problem over a metric space (𝒳, d) is defined as fol...
research
10/27/2021

Tight FPT Approximation for Constrained k-Center and k-Supplier

In this work, we study a range of constrained versions of the k-supplier...
research
03/21/2022

Coresets for Weight-Constrained Anisotropic Assignment and Clustering

The present paper constructs coresets for weight-constrained anisotropic...
research
02/01/2018

Sensitivity Sampling Over Dynamic Geometric Data Streams with Applications to k-Clustering

Sensitivity based sampling is crucial for constructing nearly-optimal co...
research
06/14/2021

Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space

We study (Euclidean) k-median and k-means with constraints in the stream...
research
10/27/2021

Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

Recent advances in center-based clustering continue to improve upon the ...

Please sign up or login with your details

Forgot password? Click here to reset